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Unite against this 
attack on scientific 
evidence 


US environment agency must desist 
from a course that could harm the 
health of people and the planet. 


he US Environmental Protection Agency (EPA) 
last week surpassed its own recent record of 
getting publicity for the wrong reasons. 
The New York Times revealed that the agency’s 
leadership is still actively discussing arule that 
would require scientists to supply it with the raw data for 
studies if the findings are to be taken into consideration 
in the drafting of environmental regulations (see p. 420). 
The EPA announced its desire for such a rule, which it is 
calling Strengthening Transparency in Regulatory Science, 
in April 2018. It is needed, the EPA says, so that the agency 
can independently reanalyse and revalidate scientific data 
and models. The EPA says that it will not recognize studies 
unless scientists agree to supply such data. 

Let us consider the implications of sucha rule, were it to 
be adopted. Many of the data that underpin public-health 
and environmental studies include information about peo- 
ple who will not have consented to disclosing their confi- 
dential data, including where they live; their travel habits; 
their age and gender identity; and the state of their health. 

Many such data were integral to the Six Cities study, 
published in 1993 by what was then the Harvard School of 
Public Health in Boston, Massachusetts. This work revealed 
that people living in polluted cities have shorter lives than 
people in cleaner cities (D. W. Dockery etal. N. Engl. J. Med. 
329, 1753-1759; 1993). The results of the Six Cities study 
led directly to the imposition of life-saving limits on fine 
particulate matter from emissions. But this research would 
have been inadmissible under the EPA’s proposed rule. 

Sothe question has to be asked: is there a problem in how 
science is assessed that needs fixing? Why would the EPA 
wish to create a rule that could risk worsening human and 
planetary health? Why would the EPA’s leaders choose to 
override their own science advisers, who questioned the 
rule? Even the US Department of Defense said in August 
2018 that the absence of underlying data “should not 
impede the use of otherwise high-quality studies’. 

Answers might be found by considering the rule in the 
context of the wider actions of the administration of Pres- 
ident Donald Trump on the environment so far. Whether 
it’s cancelling the Clean Power Plan — the previous admin- 
istration’s signature climate policy — withdrawing from 
the Paris climate agreement, weakening fuel-efficiency 
standards or cutting back on environmental research, the 
US administration is choosing to act against the consensus 
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of the scientific community. The Strengthening Transpar- 
ency in Regulatory Science rule needs to be viewed against 
the backdrop of this reality. 

The EPA has denied that the rule would be applied 
retrospectively, or to existing environmental standards. 
That might be true up to a point. But what would happen 
when existing standards needed to be reviewed — as most 
periodically are? Would the rule be applied because the 
reviewed version would be a future standard? And, if so, 
would any science — new or old — become inadmissible 
unless the underlying data and models were supplied? The 
EPA has yet to clarify what would happenin suchascenario, 
but last week’s revelations had the result of once again 
uniting the United States’ scientific, medical and health 
communities, and culminated inacrescendo of opposition. 

The scale and volume of this response should rattle the 
EPA’s leadership, and the response needs to get bigger and 
louder still. That will compel the agency to conduct more of 
the discussion around its rule in public, as it is now doing. 
Institutions and individuals must redouble their efforts. 
They must write to their elected representatives to call out 
this attempt to undermine accepted scientific practice in 
public-health and environmental standards. 

The EPA was created to protect the nation’s environment. 
As it approaches its 50th birthday next year, it must not be 
allowed to continue ona course of action that will weaken 
its ability to fulfil that role. 


Germline editing 
needs one message 


Science academies and the World Health 
Organization must speak with one voice 
on human germline genome editing. 


year ago this week, geneticist He Jiankui made 
the shocking announcement of the birth of 
twin girls in China whose genomes had been 
edited to prevent HIV infection. Undeterred by 
the global opprobrium heaped on He, Russia’s 
Denis Rebrikov told Nature last month about more experi- 
ments involving gene editing of human eggs, to help deaf 
couples give birth to children who would lack the genetic 
mutation carried by their parents that impairs hearing. 

At the same time, every month seems to bring another 
gene-editing advance. The latest tool, a precision ‘search 
and replace’ technique called prime editing, was described 
in Nature last month by David Liu at the Broad Institute 
of MIT and Harvard in Cambridge, Massachusetts, and 
his colleagues (A. V. Anzalone etal. Nature http://doi.org/ 
dczp; 2019). Randall Platt at the Swiss Federal Institute of 
Technology (ETH) in Basel called it a “giant leap” towards 
the goal of making specific changes to the blueprint of life. 

The speed of technological advance, coupled with some 
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scientists’ determination to press ahead with editing 
human germline cells — eggs, sperm and embryonic cells — 
has been sounding alarm bells for nearly five years. Editing 
could produce unpredictable changes that an individual’s 
descendants will inherit — with potentially wide-reaching 
societal implications. Academies, governments and eth- 
icists have been considering how to regulate this. But the 
manner in which it is being done is suboptimal. 

In 2018, the World Health Organization (WHO) set up 
an independent expert panel to advise on the oversight 
and governance of human genome editing. A separate 
international commission on the clinical use of human 
germline genome editing gathered for its second meeting 
in London last week. This commission was established by 
the US National Academy of Science, the US National Acad- 
emy of Medicine and Britain’s Royal Society, to recommend 
standards and criteria for germline genome editing. Both 
will report next year, and the commission’s report will feed 
into the WHO process. 

But the WHO panel has already recommended setting 
up a public registry for genome-editing experiments. It 
has also made an interim recommendation that “it would 
be irresponsible at this time for anyone to proceed with 
clinical applications of human germline genome editing”, 
which has been accepted by the agency’s leadership. The 
international commission has yet to say whatit thinks, but 
it would make little sense for it to disagree. 

Itisn’t entirely clear why separate initiatives are needed, 
and it is unfortunate that representatives of people with 
disabilities are not part of the decision-making process. 
However, it isn’t too late to rectify these issues, and the two 
initiatives must, in the end, converge. 

There are very real risks that unregulated clinics claiming 
to beable to eliminate inherited conditions will use untested, 
possibly harmful procedures. A sure-fire way to give such 
clinics the green light is an absence of agreed global stand- 
ards. When the two groups report next year, they must speak 
with one voice and have more inclusive representation. 


Ashockto 
the system 


California’s universities must help to design 
and build aclean and resilient power grid. 


onfusion reigned the first time that the 
University of California, Berkeley, lost its 
connection to the city’s electricity grid, on 
9 and10 October. Campus officials were unable 
to say how long the university’s power plant 
could provide emergency electricity for crucial facilities — 
suchas freezers containing valuable research specimens. 
Some scientists didn’t even know which electric plugs to 
use to access back-up power. As a precaution, researchers 
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packed freezers with dry ice, and some sent their most 
important samples to other institutions. 

This chain of events can be traced back to last November, 
when a faulty transmission line sparked the deadliest wild- 
fire in California’s history. The Camp Fire tore through the 
town of Paradise, killed 86 people and levelled thousands 
of homes and businesses. 

Faced with an estimated US$30 billion in insurance 
claims from that fire and others in 2017, the state’s largest 
utility provider, San Francisco-based Pacific Gas and Elec- 
tric Company (PG&E), filed for bankruptcy inJanuary. Then, 
when hot, dry winds raised the fire danger in early October, 
the company cited legitimate liability concerns and shut 
down major sections of the electricity grid to prevent more 
blazes from breaking out. 

Evidence that global warming is promoting more 
frequent and severe wildfires has been mounting for 
decades, and the fact that electrical equipment can start 
fires, and contribute to their spread, is hardly news. But few 
could have predicted that vast stretches of California —the 
world’s fifth-largest economy and a global hub for research 
and innovation — would be paralysed by a combination of 
wildfire and electricity blackouts. 

Safeguarding lives and habitats from these catastrophes 
has to be the top priority for the state’s decision makers. 
Solutions for upgrading the grid range from the obvious 
tothe technological. Electrical equipment should be kept 
clear of vegetation, with power lines buried underground, 
where feasible. Cameras, sensors and other systems could 
allow grid operators to detect and isolate problems with 
speed and precision. There are also measures that Berke- 
ley and other institutions can take, such as reducing their 
energy demands and allocating limited emergency power 
to only the most urgent needs. 

At the same time, California’s research and technology 
institutions, and its decision makers, could harness more 
of the state’s considerable research muscle in energy and 
energy policy to address the bigger picture: creating amore 
resilient, cleaner grid for the whole state. 

Researchers at Berkeley and elsewhere have spent years 
developing smart-grid technologies that allow more control 
of where electricity goes and when. Economists are calculat- 
ing the costs and benefits of different kinds of energy infra- 
structure, such as installing solar panels, or using fuel cells 
powered by renewably produced hydrogen. 

More of this pioneering work should be deployed to solve 
problems in the institutions’ home state. Like the back-up 
power system that Berkeley used when the grid failed, a 
wider network of increasingly smaller grids that can be 
isolated or boosted as needed might be the future. 

California’s fires are nowa chronic problem. A safe, clean, 
efficient and resilient grid has to bea shared responsibility, 
and not something for politicians alone to fix. The state’s 
dynamic research, technology and innovation communi- 
ties must step up to solve the problems in their individual 
organizations and at the same time craft wider solutions 
that help California — along with regions worldwide — 
adapt to our thirst for more energy in an increasingly 
warmer world. 
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A personal take on science and society 


World view 


By Jerry Ravetz 


Stop the science training 
that demands ‘don’t ask’ 


It’s time to trust students to handle doubt 
and diversity in science, says Jerry Ravetz. 


sachild, I realized that my parents spoke in 
Yiddish when they didn’t want me to know 
what they were talking about, sol became aware 
that some knowledge was intended only for 
grown-ups — don’task. In college, | was taught 
anelegant theory of chemical combination based on excess 
electrons going into holes inthe orbital shell ofa neighbour- 
ing atom. But what about diatomic compounds like oxygen 
gas? Don’t ask; students aren't ready to know. In physics, I 
learnt that Newton’s second law of motion is not an empiri- 
cal, approximate relation such as Boyle’s and Hooke’s laws, 
and instead has a universal application; but what about the 
science of statics, in which forces are balanced and there is 
noacceleration? Don’task. Mere students are not worthy of 
ananswer. Yet when| was moonlighting in the social sciences 
and humanities, I found my questions and opinions were 
respected, even if only as part of my learning experience. 

Observant students will notice that social problems 
surrounding science are seldom mentioned in official 
curricula. And now, these pupils are starting to act. They 
have shamed their seniors into including more diverse con- 
tributors as faculty members and role models. Young schol- 
ars insolently ask their superiors why they fail to address 
the extinction crises elucidated by their research. Such sub- 
versions are reminiscent of the mass-produced heretical 
pamphlets circulated by Martin Luther’s supporters at the 
start of the Protestant Reformation in sixteenth-century 
Europe. The inherited authoritarian political structures of 
science education are becoming brittle — but still remain 
largely unchanged from my own school days. 

The philosopher Thomas Kuhn once compared taught 
science to orthodox theology. A narrow, rigid education 
does not prepare anyone for the complexities of scientific 
research, applications and policy. If we discourage students 
from inquiring into the real nature of scientific truths, or 
exploring how society shapes the questions that research- 
ers ask, how can we prepare them to maintain public trust 
in science in our ‘post-truth’ world? Diversity and doubt 
produce creativity; we must make room for them, and stop 
funnelling future scientists into narrow specialties that 
value technique over thought. 

Inthe 1990s, Silvio Funtowicz, a philosopher of science, 
and! developed the concept of ‘post-normal science’, build- 
ing onthe Kuhnian terms ‘normal’ and ‘revolutionary’ sci- 
ence. It outlines howto use science in a society confronted 
with high-stakes decisions, where both facts and values 
are uncertain; it requires drawing on a broad community 
with broad inquiries. Suppressing questions from budding 
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scientists is sure to suppress promising ideas and solutions. 

As anonagenarian and former historian of science, I 
know that even foundational building blocks can be ques- 
tioned. The unifying patterns of the periodic table are now 
seen, under closer scrutiny, to be riddled with anomalies 
and paradoxes (E. Scerri Nature 565, 557-559; 2019). Some 
scientists now wonder whether the concept of biological 
‘species’ contributes more confusion than insight, and 
whether it should therefore be abandoned (see go.nature. 
com/2offaav). However, sucha decision would affect con- 
servation policy, in which identification of endangered 
species is crucial — soit is not just an issue for basic science. 

Science students generally remain unaware that concepts 
suchas elements and species are contested or are even con- 
testable. In school, college and beyond, curricula highlight 
the technical and hide the reflective. Public arguments 
among scientists often presume that every problem has just 
one solution. When they were students, these researchers 
had never learnt that they have a right to be wrong. 

And when scientists advise on policy, they are pressured 
to become attached to official stances onissues, or toshun 
the responsibility entirely. They then find it difficult to 
resist dismissing all critics as cranks or ‘denialists’, whose 
rejection of ‘facts’ is a sign of their depravity. (To be sure, 
much of science denial is cynical and self-serving.) 

Nonetheless, vacillating advice on complex issues, most 
obviously nutrition, should bea warning that, froma future 
perspective, today’s total scientific consensus on some pol- 
icy issue might have been the result of obduracy, aconflict 
of interest or worse. 

Trust in established science will not be protected by 
exhortations, denunciations and absolutism. Just as a 
healthy democracy accommodates dissent and dissonance, 
the collective consciousness of science would do well to 
embrace doubt and diversity. This could start with teaching 
science as a great, flawed, ongoing human achievement, 
rather than as a collection of cut-and-dried eternal truths. 
There is plenty of material for sucha Socratic education in 
science: physics and cosmology now enjoy creative igno- 
rance; the digital and life sciences abound in moral mazes; 
and environmental and sustainability sciences demand 
recognition of complexities. The established ‘facts’ can 
function as tools for ongoing dialogues. 

Irecalla legendary chemistry professor who was inept at 
getting classroom demonstrations to work — but discussing 
what went wrong helped his students to thrive. Amathema- 
tician friend ran his classes like those in an Athenian agora: 
pupils discussed every statement in the textbook until all 
were satisfied. They did very well in exams, and taught them- 
selves when he was absent. Treating people at all levels as 
committed thinkers, whose asking teaches us all, isthe key 
to tackling the challenges to science in the post-trust age. 


Nature | Vol575 | 21November 2019 | 417 


© 2019 Springer Nature Limited. All rights reserved. 


LTO R: KIN CHEUNG/AP/SHUTTERSTOCK; JAXA, CHIBA INSTITUTE OF TECHNOLOGY & COLLABORATORS; DARREN PATEMAN/EPA-EFE/SHUTTERSTOCK; DREW ANGERER/GETTY 


The world this week 


Newsin brief 


ENVIRONMENT 
AGENCY PUSHES TO 
RESTRICT DATAUSE 


Scientists are alarmed about 
the expansion of a proposed 
rule that would limit which 
studies the US Environmental 
Protection Agency (EPA) can 
use to develop health and 
environmental regulations. 

The supplemental rule 
builds ona controversial 
proposal released last year that 
would prevent the EPA from 
considering research unless the 
underlying data are publicly 
available, according to a leaked 
draft reported by The New 
York Times on 11 November. 
Critics of the original proposal 
feared that it would prevent 
consideration of research, such 
as epidemiological studies, 
based on confidential health 
data. 

That proposal would have 
applied to a restricted number 
of studies. But scientists say 
the leaked supplement is worse 
because it would expand the 
rule to cover almost any kind of 
research. The text also suggests 
that the rule could apply to data 
regardless of when they were 
generated, potentially affecting 
the agency’s consideration of 
previously published studies. 

Ifthe final rule looks like 
the leaked proposal, “it will 
fundamentally change the 
way EPA uses science to make 
public-health decisions — to the 
detriment of public health”, says 
Veena Singla, a public-policy 
and health researcher at the 
University of California, San 
Francisco. 

EPA officials stressed that 
the final proposal under 
review at the White House is 
different from the leaked draft. 
The agency must publish the 
final text and accept public 
comments before it can finalize 
the supplemental rule. 


VIOLENCE IN HONG 
KONG DISRUPTS 
RESEARCH 


Three universities in Hong 
Kong have cancelled classes on 
campus for the rest of the term 
after violent clashes between 
police and protesters erupted 
in the grounds. Another four 
universities have also cancelled 
classes — insome cases for the 
rest of the year — over safety 
concerns. And staff at most of 
the institutions have been told 
to stay away for several days. 

Images show some protesters 
carrying bows and arrows — one 
police officer was reportedly hit 
in the leg with an arrow. 

The clashes are the latest flare 
up in Hong Kong, and follow 
six months of street protests. 
These started in June against an 
extradition bill that would have 
allowed people to be sent from 
the territory to mainland China 
to stand trial or serve criminal 
sentences. 

The protests on campuses are 
also disrupting research, and 
some scientists fear that this 
could dissuade academics from 
coming to Hong Kong. 

Michael Chan, a chemist at 
the Chinese University of Hong 
Kong, says he has been unable to 
access his lab to check on mouse 
experiments. 

Jianhua Zhang, the dean of 
science at Hong Kong Baptist 
University, worries that the 
ongoing protests will have a 
wider effect on academia. “I 
anticipate that people will be 
reluctant to take offers to work 
with us,” he says. 
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BYEBYE, RYUGU! 
CRAFT LEAVES 
ASTEROID 


The Hayabusa2 spacecraft 

is heading home after 
performing a series of risky and 
unprecedented manoeuvres on 
its six-year mission to asteroid 
Ryugu (pictured). 

The Japan Aerospace 
Exploration Agency's (JAXA’s) 
probe gently fired its thrusters at 
10:05 a.m. Japan Standard Time 
on13 November, moving away 
from the asteroid at a speed of 
less than 10 centimetres per 
second. From 10 December, the 
probe will start to use its ion 
engines to propel its journey 
back to Earth, where it is due 
to arrive at the end of 2020.A 
re-entry capsule will deliver its 
samples to the surface. 

Hayabusa2 was launched 
in late 2014, and arrived at 
Ryugu in June 2018. It is the first 
mission to release landers onto 
the surface of an asteroid; the 
first to collect asample froma 
‘dark’ asteroid’s surface; and, 
after bombarding the surface 
to create a crater, the first to 
collect asample of an asteroid’s 
subsurface material. 

Just one kilometre wide and 
shaped like a spinning top, 
Ryugu is an unusually dark 
body, probably the result of 
having a high concentration of 
carbon. Initial studies based on 
Hayabusa2’s data suggest that 
Ryugu formed from the debris 
of an impact between two larger 
Solar System bodies. 
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Bush fires 
wreak havoc 
in eastern 
Australia 


Sign up to get essential science 
news, opinion and analysis 
delivered to your inbox daily. 
Visit go.nature.com/newsletter 


Firefighters near the town of Nabiac in eastern 
Australia have been confronting a huge wildfire that 
has burnt more than 30,000 hectares over the past 

10 days. Several hundred conflagrations have burnt 
more than one million hectares, destroyed more than 
450 homes and killed 4 people across the state of New 
South Wales since 8 November. More than 50 fires were 
still burning as Nature went to press. 

Scientists have forecast particularly severe conditions 
for bush fires this season because large parts of the 
country arein drought. Climate change is also making 
fire conditions more frequent and severe. 

On12 November, hot, dry and windy conditions 
prompted a ‘catastrophic’ fire warning for vast areas of 
New South Wales, including Sydney and its surrounding 
areas; blazes that ignite during these conditions are 
likely to spread out of control quickly and houses 
are unlikely to survive. It is the first time that the 
catastrophic fire rating has been issued for Sydney since 
new ratings were introduced in 2009. 


FEARS OF FOREIGN 
INTERFERENCE 
PROMPT UNIVERSITY 
GUIDELINES 


New guidelines will help 
Australian universities to 
protect themselves against 
foreign interference, says the 
country’s government. The 
advice follows concerns that 
foreign groups or authorities, 
such as the government of 
China, might be seeking to 
instigate campus activities that 
are against Australia’s interests. 

The guidelines, released on 
14 November, were developed 
by the University Foreign 
Interference Taskforce, which 
includes representatives from 
universities, national security 
agencies and the education 
department. 

Education minister Dan 
Tehan, who set up the task 
force in August, said foreign- 
interference threats against 
Australia, including its 
universities, had reached 
“unprecedented levels”, but gave 
no details at a press briefing. 

The guidelines advise 
universities to undertake due 
diligence before entering into 
research or other collaborations 
with international partners, 
and to implement robust 
cybersecurity strategies. 

In late 2018 and early 2019, the 
Australian National University 
in Canberra experienced 
significant data breaches, 
in which hackers accessed 
19 years’ worth of personal data 
from the university’s network. 
Media reports have suggested 
that the hack was perpetrated 
from China, but the Australian 
government says the attack has 
not been attributed to any one 
country. 

Politicians and academics 
have also raised concerns about 
some artificial-intelligence 
projects involving Chinese 
universities and Australian 
researchers. 


GOOGLE HEALTH-DATA 
SCANDAL SPOOKS 
RESEARCHERS 


Google and one of the largest 
health-care networks in the 
United States are embroiled 
ina data-privacy controversy 
that researchers fear could 
jeopardize public trust in 
data-sharing practices and, 
potentially, academic studies. 
Atissueis an agreement, 
dubbed Project Nightingale, 
that gives Google access to 
the health-care information, 
including names and other 
identifiable data, of tens of 
millions of people without their 
knowledge. The people were 
treated at facilities run by the 
health network Ascension. 
Google says that the project, 
first reported in The Wall Street 
Journal on11 November, is 
meant to develop technology 
that would enable Ascension to 
deliver improved health care. 
Both companies say that they 
abided by US laws to protect 
health-care information. But 
the US Department of Health 
and Human Services says it is 
now looking into “this mass 
collection of individuals’ 
medical records with respect 
tothe implications for patient 
privacy”. Researchers worry that 
the revelations will undermine 
trust in studies more broadly. 
“With these incidents, we 
undermine public trust to this 
whole enterprise,” warns Effy 
Vayena, a bioethicist at the Swiss 
Federal Institute of Technology 
in Zurich. “At some point, all of 
the research will get abad name.” 
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The world this week 


News in focus 


Waterholes visited by the endangered Gouldian finch contained trace DNA that allowed scientists to detect the bird’s presence. 


RARE BIRD'S DETECTION 
HIGHLIGHTS PROMISE OF 
"ENVIRONMENTAL DNA’ 


Researchers are increasingly using traces of genetic material 
in soil, water or ice to track rare and endangered species. 


By Dyani Lewis 


NA gathered from remote waterholes 

in northern Australia has been used 

to detect a rare bird in the wild’ for 

the first time. The result is the latest 
milestone inthe rapidly maturing sci- 

ence of environmental DNA, in which traces of 
genetic material from soil, water or ice are used 
to reveal the presence of plants and animals. 
Inastudy published on14 November, ateam 

in Australia reports that genetic material col- 
lected from waterholes showed that Gouldian 
finches (Erythrura gouldiae) had visited them 
in the previous 48 hours. Rangers also con- 
firmed the species’ presence at the locations. 


Scientists have been using environmental 
DNA (eDNA) analysis for about 15 years, for 
purposes including tracking rare or elusive 
aquatic species, suchas the great crested newt 
(Triturus cristatus) in the United Kingdom’. 
And in the past few years, the technique has 
increasingly been used to identify mammals, 
insects — and now birds — that live on land. 

Testing for eDNA is often safer — for both 
animals and researchers — more cost-effective 
and, in some cases, more accurate and sensi- 
tive than conventional methods used to pin- 
point rare and endangered species, scientists 
say. This is prompting regulatory agenciesina 
number of countries to adopt the technology 
to locate creatures, such as the endangered 
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Canada lynx (Lynx canadensis) in the United 
States, or to monitor for invasive species. 

But the technique is yet to convince some 
scientists, who say eDNA results aren't robust 
enough to be used as the sole basis for making 
environment-management decisions that can 
have legal implications for governments and 
land owners. 

Early studies that used eDNA to pinpoint 
specific species were criticized because of 
the potential for improper handling of sam- 
ples to cause cross-contamination, leading 
to false-positive results. Scientists using the 
method are detecting only trace amounts of 
genetic material, so even minute amounts 
of contamination can taint the results. But 
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proponents of the field say that the recent 
adoption of rigorous protocols that avoid or 
detect contamination have largely addressed 
such issues. 

The first study to show that large-bodied 
animals and plants drop enough DNA into their 
environment — through defecation and shed- 
ding cells — to be detected? was published in 
2003. Five years later, another team showed 
that DNA in pond water could be used to detect 
the invasive American bullfrog (Rana catesbe- 
iana)*. Most such studies gather genetic mate- 
rial from aquatic environments because DNA 
disperses and remains free-floating in water, 
and can be detected in trace amounts. 


Massive time savings 


Around 2014, Michael Schwartz, who heads 
up the US Forest Service’s National Genomics 
Center for Wildlife and Fish Conservation in 
Missoula, Montana, and his team used eDNA 
to detect the endangered and hard-to-mon- 
itor bull trout (Salvelinus confluentus). The 
researchers initially analysed 124 water 
samples from waterways across Montana’, 
amassing a volume of data equivalent to that 
collected over the previous 15 years through 
conventional surveys that used electrofishing, 
a method that is risky for people and fish, in 
which a current is run through the water to 
attract and then net fish. “We were able to do 
that in eight days,” Schwartz says. “We have 
estimated that it is about two to ten times 
faster and two to five times more cost-effective 
to use eDNA compared to electrofishing.” 

Earlier this year, Schwartz's team published 
results showing that DNA left in snow tracks 
or in snow near camera traps could be used 
to identify the presence of Canada lynx and 
wolverine (Gulo gulo) in Montana, and asmall 
carnivorous mammal called the fisher (Peka- 
nia pennanti) in|daho®. Conventional methods 
for detecting the presence of land animals typ- 
ically involve time-consuming surveys toiden- 
tify an animal by its tracks alone, or fromscat. 

In another case, eDNA was more sensitive 
than conventional methods. When a camera 
trap image was unable to clearly identify what 
looked to be a Canada lynx in an area where 
its presence was unknown to rangers, DNA 
extracted from the snow confirmed that the 
creature was indeed a lynx, says Schwartz. 

Insome cases, eDNA analyses are being used 
to enforce policy. In 2014, the UK government 
approved the use of eDNA analysis for detect- 
ing the endangered great crested newt inland- 
use surveys that are required by law. 

With a burgeoning market for eDNA anal- 
yses, dozens of companies now offer genetic 
tests for detecting rare species. 

To reduce problems such as false positives 
that plagued the field in its early days, there are 
now standard methods for handling samples 
and detecting contamination, says Florian 
Leese, an aquatic ecologist at the University 


DNA from snow tracks allowed scientists to 
detect the presence of the Canada lynx. 


of Duisburg-Essen in Germany. Adequate 
sampling, sterile equipment and experimental 
controls can all help to guard against contam- 
ination. DNAqua-Net, a European-based net- 
work of researchers who work with industry 
bodies and regulatory agencies, is develop- 
ing best-practice guidelines on howto design 
and validate tests for individual species and to 
define the amount of DNA needed to besurea 
test returns a genuine positive result. 

But some ecologists are reluctant to 


abandon conventional methods. Jean-Marc 
Roussel, an aquatic ecologist at the French 
National Institute for Agricultural Research 
in Rennes, says that more studies comparing 
the cost and accuracy of eDNA analysis to con- 
ventional monitoring methods are needed 
before environment-management decisions 
are made on the basis of eDNA results. 

Molecular ecologist Cecilia Villacorta Rath 
at James Cook University in Townsville, Aus- 
tralia, thinks researchers also need to demon- 
strate that genetic tests are sensitive and 
specific enough to avoid false negatives — the 
failure to detect a target species that is there. 

Robust results are essential because the 
discovery of an endangered species can have 
weighty legal ramifications. In the United 
States, such species need to be protected under 
the Endangered Species Act, so an area could 
be designated a critical habitat as a result. 

As the chair of DNAqua-Net, Leese is leading 
the charge to develop standards that ensure 
genetic tests are accurate and give agencies 
confidence in their results. The next step could 
beto certify companies and laboratories doing 
eDNA studies, he says. 
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ITALIAN PLAN FOR NEW 
RESEARCH AGENCY 
DRAWS CRITICISM 


Scientists say they haven’t been consulted on the 
creation of another national science funder. 


By Marta Paterlini 


he Italian government is debating 
whether to set up a national research 
agency — an organization that could 
boost research funding by hundreds of 
millions of euros a year. But although 
scientists have long called for such an agency, 
some are concerned about the latest plans. 
They worry that researchers haven't been 
involved in discussions about the organi- 
zation, and that it won’t be independent of 
political influence. 
Prime Minister Giuseppe Conte, who leads 
a coalition government of the populist Five 
Star Movement and the centre-left Demo- 
cratic Party, mentioned the idea for a National 
Research Agency (ANR) in a September 
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speech. The proposal will be discussed in 
parliament this month as part of Italy’s 2020 
budget bill. 

Italy already has several mechanisms 
for funding basic science, but researchers 


“Theagency’s function and 
governance can only be 
decided after adiscussion with 
the research community.’ 


complain that the system is haphazard, 
and that calls for grant proposals are often 
delayed. The country’s existing National 
Research Programme has a budget of 
€2.5 billion (US$2.8 billion) for 2015-20. But 
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the scheme’s main source of money for basic 
research — the Research Projects of National 
Relevance programme — last madea grant call 
in 2017. Moreover, Italy invests only 1.2% of its 
gross domestic product in research — far below 
the European Union target of 3%. 

Many scientists had hoped for an agency 
that would simplify research funding, but note 
that the ANR instead adds another organiza- 
tion with its own budget. Andit is not yet clear 
how the ANR would interact with Italy’s other 
science-funding mechanisms. The bill up 
for discussion states that the agency would 
coordinate the direction of research at uni- 
versities and public research bodies, fund 
“highly strategic” projects and encourage 
Italian participation in European and inter- 
national research initiatives. It would receive 
€25 million in 2020, €200 million in 2021 and 
€300 million per year from 2022. 


Missed opportunity 


“It is promising that the matter is part of the 
current government’s strategy. Unfortu- 
nately, the model behind it is not yet clear,” 
says Vincenzo Costanzo, a cancer researcher 
at IFOM, a molecular-oncology institute in 
Milan. The move is a missed opportunity to 
bring all government research funding undera 
single body inatransparent and independent 
manner, he adds. “We really need an agency 
that regulates the annual grant calls.” 

Researchers also worry that they have not 
been involved in the ANR’s planning, and are 
concerned about the agency’s political inde- 
pendence. According to the bill, the ANR’s 
leaders will be appointed mainly by politicians: 
the prime minister would choose the director, 
and government ministers would select most 
of the agency’s eight-member executive com- 
mittee. Many had instead hoped for an agency 
overseen by research managers and scientific 
advisers. 

Overall, the agency is a positive step, says 
Giuseppe Remuzzi, director of the Mario Negri 
Institute for Pharmacological Research in 
Bergamo. But the government’s role should 
be restricted to making suggestions about 
appointments, and executive-committee 
members should be chosen by a group 
operating under the best practices used 
by the international scientific community, 
he says. 

Lorenzo Fioramonti, Italy’s research 
minister, says that scientists should feed into 
the ANR’s development. He was involved in 
the idea to create the agency, but says he was 
surprised that the draft law also included 
information onthe agency’s governance. “The 
agency’s function and governance can only be 
decided after a discussion with the research 
community,” he says. Fioramonti had hoped 
that the bill would serve only to set up the 
agency, with details of its governance and grant 
management decided early next year. 


FIRST VACCINE AGAINST 
DEADLY EBOLA VIRUS 


WINS APPROVAL 


The shot has already been given to hundreds of 
thousands of people in ongoing Africa outbreak. 


An Ebola vaccine has been approved by the European Medicines Agency. 


By Ewen Callaway 


he world finally has an Ebola vaccine. 
On11 November, European regulators 
approved a vaccine that has already 
helped to control deadly outbreaks 
of the virus — the first time any immu- 
nization against Ebola has passed this hurdle. 

The decision by the European Medicines 
Agency (EMA) to allow US pharmaceutical 
company Merck to market its vaccine means 
that the product can now be stockpiled and, 
potentially, distributed more widely than itis 
now, particularly in Africa. In 2015, Gavi, the 
Vaccine Alliance — a global health partner- 
ship based in Geneva, Switzerland, that funds 
vaccine distribution in low-income countries 
— told manufacturers that it would commit 
to purchasing their Ebola vaccines once they 
had been approved by a “stringent health 
authority”, such as the EMA. 

Although several other vaccines against 
Ebola — a haemorrhagic fever that causes 
severe diarrhoea, vomiting and bleeding — 
are in development, Merck’s is the only one 
that has been tested during an outbreak, in 
which it was shown to be highly effective at 
preventing infection. 

The vaccine, first patented in 2003, has 
been administered onan emergency basis to 
quell the ongoing outbreak in the Democratic 
Republic of the Congo (DRC), which has killed 
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some 2,000 people since it started last year. It 
was also used during a 2018 outbreak in that 
country, and in Guinea in 2015. Inthe current 
outbreak, hundreds of thousands of people 
have received the Merck shot, including more 
than 60,000 health-care workers in the DRC 
and several neighbouring countries. 

“This is a vaccine with huge potential,” said 
Seth Berkley, chief executive of Gavi, ina press 
release after the EMA’s decision. “It has already 
been used to protect more than 250,000 peo- 
pleinthe DRC and could well make major Ebola 
outbreaks a thing of the past.” The organiza- 
tion has supported the stockpiling and deliv- 
ery of Ebola vaccines and hopes to build upa 
global supply that could be rolled out quickly 
during future outbreaks. 


Future protection 


The EMA’s approval “makes a big difference”, 
says David Heymann, an epidemiologist at 
the London School of Hygiene and Tropical 
Medicine. But he stresses that research 
into the Merck vaccine and development of 
others must continue. “The message is that 
the research is not done,” he adds. Research 
could help to develop vaccines that offer 
longer-lasting immunity, target more than 
one species of Ebola and are easier to store. 
Merck’s vaccine, which is marketed under 
the name Ervebo and known to researchers 
as rVSV-ZEBOV-GP, was tested in a clinical trial 
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conducted in Guinea towards the end of the 
2014-16 Ebola outbreak in West Africa. There, 
the vaccine was administered to people who 
had been in contact with someone who was 
infected with Ebola, and to their subsequent 
contacts. It was found to offer a high level of 
protection against infection. 

Health workers have used this strategy — 
knownas ring vaccination — in the two other 
outbreaks in which rVSV-ZEBOV-GP had been 
deployed. But Heymann says it’s important 
to determine whether the Merck vaccine has 
other uses — for instance, preventive admin- 
istration to emergency health workers who 
might encounter Ebola in the distant future. 
For this, researchers will need to determine 
how long the vaccine’s protection lasts, 
and whether a ‘booster’ dose can extend 
immunity. 

Such studies are in the works with rVSV-ZE- 
BOV-GP and competing vaccines, says Adrian 
Hill, avaccinologist at the University of Oxford, 
UK. “The question remains, which vaccine 
would you give to, say, health-care workers 
to prevent them getting Ebola?” 

Merck’s product protects against the Zaire 
species of the Ebola virus, which is behind 
the current DRC outbreak and the 2014-16 
West Africa outbreak. It will be important 
to develop vaccines against other species 
of the virus — especially the Sudan species, 
which has caused seven known outbreaks 
since 1976, says Hill, who helped to test 
an Ebola vaccine that the London-based 
pharmaceutical company GlaxoSmithKline 
shelved in August. 

There are seven other Ebola vaccines in 
various stages of clinical testing, according 
to the World Health Organization (WHO) 
in Geneva. In September 2019, the WHO 
announced that a vaccine manufactured by 
Johnson & Johnson in New Brunswick, New 
Jersey, would be used in the current DRC out- 
break. Last week, the company submitted that 
vaccine for EMA approval. 

Unlike the Merck vaccine, which is given in 
one dose, the Johnson & Johnson immuniza- 
tion requires a booster shot that is adminis- 
tered 56 days after the first injection. In the 
DRC, it will be given to people at risk of Ebola, 
suchas health-care workers, in areas where 
the virus is not already circulating. 

And next month, Gavi’s board will decide 
whether to establish a global stockpile of Ebola 
vaccines. Merck, which is headquartered in 
Kenilworth, New Jersey, is seeking approval 
for its vaccine by the US Food and Drug 
Administration. 

On 12 November, the WHO announced it 
had “prequalified” the Merck vaccine, which 
means that the product meets the agency’s 
standards for quality, safety and efficacy. 
Other UN agencies, Gavi and many national 
health agencies look to this endorsement 
when procuring and delivering a vaccine. 
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Astronomers have used data from NASA's 
Cassini mission to map the entire surface 
of Titan, Saturn’s largest moon, for the first 
time. Their charts reveal a diverse terrain 
of mountains, plains, valleys, craters and 
lakes unlike anywhere in the Solar System 
outside Earth. 

The Cassini spacecraft orbited Saturn 
from 2004 to 2017 and collected vast 
amounts of information about the gas 
giant and its moons. The mission included 
more than 100 fly-bys of Titan, allowing 
researchers to glimpse the moon’s surface 
through its thick atmosphere and survey its 
terrain in unprecedented detail. 

Rosaly Lopes, a planetary scientist 
at NASA's Jet Propulsion Laboratory in 
Pasadena, California, and her colleagues 
stitched together images and radar 
measurements taken by the spacecraft to 
produce the global map of Titan, which 
they published on 18 November in Nature 
Astronomy (R. M. C. Lopes et al. Nature 
Astron. http://doi.org/dfb8; 2019). 

“Titan has an atmosphere like Earth. It 
has wind, it has rain, it has mountains. It’s a 
really very interesting world, and one of the 
best places in the Solar System to look for 
life,” says Lopes. 

Nearly two-thirds of Titan’s surface 
consists of plains, the map reveals, and 
17% is covered in sandy dunes shaped 


by the wind, mostly around the equator. 
Around 14% of the surface is classified as 
‘hummocky’ — hilly or mountainous — and 
1.5% is ‘labyrinth’ terrain, with valleys 
carved by rain and erosion. There are 
surprisingly few impact craters, suggesting 
that the moon's surface is fairly young. 

Titan is the only world in the Solar System 
aside from Earth with known bodies of 
liquid on its surface. However, these seas 
and lakes are filled with liquid methane 
rather than water, and they cover just 1.5% 
of the moon's surface. 

“The most profound discovery of 
Cassini is that Titan is so diverse,” says 
Ralph Lorenz, a planetary scientist at the 
Johns Hopkins University Applied Physics 
Laboratory in Laurel, Maryland. “It’s almost 
like acompletely different world.” 

By 2034, NASA plans to send a drone 
to Titan on the Dragonfly mission, which 
will fly across the surface and study it in 
multiple locations. But there are no current 
plans to send further orbiters to Saturn or 
its moons, so this map is likely to remain our 
best global view of Titan for the foreseeable 
future. 


By Jonathan O'Callaghan 


LAB SEQUENCES GENOMES 
OF ACONTINENT'S 


BUTTERFLIES 


Draft genomes of more than 800 varieties hint at 
the role of interbreeding in the animal’s evolution. 


By Ewen Callaway 


hen biologist Nick Grishin 
wanted to tackle big questions in 
evolution — why some branches 
of the tree of life are so diverse, 
for instance — his team set out to 
sequence the genomes of as many butterflies 
as it could: 845 species, to be precise. 
Inastudy that some researchers are hailing 
as alandmark in genomics, Grishin’s group at 
the University of Texas Southwestern Medical 
Center in Dallas sequenced and analysed the 
genome of what it called a “complete butterfly 
continent”: every species of the creature in 
the United States and Canada. The study was 
posted on the bioRxiv server on 4 November". 
“I think its bloody amazing, because the 
technology involved in sequencing 845 species 
is there,” says James Mallet, an evolutionary 
biologist at Harvard University in Cambridge, 
Massachusetts. “It’s a beautiful piece of work.” 
The data allowed Grishin’s team to build an 
evolutionary tree detailing the relationships 
of all the butterflies, as well as to determine 
the pace at which new species formed. The 


team suggests that fast-diversifying groups 
of butterflies are those that swap genes 
with close relatives through interbreeding 
— a phenomenon that could extend to other 
organisms. 

Others, however, point out that most of 
these genomes will be of limited use to other 
researchers, because they are low-quality 
‘drafts’ comprised of thousands of short DNA 
stretches, and not higher-quality sequences 
that have been assembled into longer stretches. 
Grishin says that the sheer number of genomes, 
even of low quality, allows his team to draw 
broad conclusions about evolution that could 
not be made from more limited data sets. He 
plans to make the genomes publicly available. 


Butterfly patterns 


Grishin, whose research group studies the 
shape and evolution of proteins, started 
researching butterflies after reading a 
2012 paper’ on the diverse tropical genus 
Heliconius, whose species have elaborate 
wing patterns that mimic those of other but- 
terflies. The study found that some genes 
that determine wing patterns seemed to have 


A Heliconius butterfly. 
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been passed between three Heliconius species 
throughinterbreeding, instead of being inher- 
ited from the species’ common ancestor, and 
suggested that such swaps explain the huge 
diversity of Heliconius butterflies. 

Inspired by that work, Grishin wondered 
whether sucha connection could be seen in 
other butterflies. “Some groups diversify very 
rapidly and there are many species in them, 
and others are kind of empty,” he says. “So to 
understand why and how that happens, we 
would need to sequence them all.” 

At one time, sequencing hundreds of 
butterfly genomes would have been unafford- 
able, but costs have plummeted in recent 
years. Collecting samples for every species 
in the United States and Canada was still a 
challenge, however. Grishin’s team worked 
with amateur butterfly enthusiasts as well as 
museum collections across the United States 
to gather data — a single leg from a dead 
specimen was enough to obtain a draft-quality 
genome. 

Once they had sequenced the genomes of 
all 845 species, the researchers worked out 
the evolutionary relationships. Their butterfly 
family tree broadly agreed with existing ones 
based on anatomy and more limited genetic 
analyses, although the group did reclassify 
40 species and suggested several new group- 
ings at the genus level. 


The tree also revealed that some butterfly 
groups have evolved faster than others. Two 
of the fastest-evolving ones, commonly known 
as the blues and the whites, have developed 
highly specialized interactions with other 
organisms that might explain their rapid 
evolution, say Grishin’s team. The blues, or 
Polyommatinae, form symbiotic relationships 


“Some groups diversify very 
rapidly. To understand how 
that happens, we would need 
tosequence themall.” 


with ants, whereas whites, or Pierini, have 
developed adaptations to feed on mustard 
plants that are toxic to many other insects. 

An analysis of genes shared by multiple 
species also showed that these diverse groups 
were likely to have acquired genes through 
interbreeding. Many of the genes that are 
swapped between species are thought to be 
involved in mate recognition and other fac- 
tors that can cause species splits. Grishin says 
that by spreading such genes, interbreeding 
— rather thanthe gradual accrual of new muta- 
tions — could be helping to drive the evolution 
of butterfly species. 

The link between interbreeding and 


speciation is “an idea that is sort of coming to 
the fore”, says Mallet, who co-led a team that 
reported similar findings in Heliconius butter- 
flies this month?. 


Missing data 

In addition to the draft genomes, Grishin’s 
team generated ‘reference’ genomes, in 
which genes are assembled into chromosome 
sequences, for 23 species. 

High-quality genomes such as this are 
the targets of other large-scale projects to 
sequence the tree of life. In 2018, aconsortium 
called the Earth BioGenome Project laid out 
plans to decode the genomes of the roughly 
1.5 million known species of eukaryote — 
animals, plants, protozoans and fungi — at an 
estimated cost of US$4.7 billion over 10 years. 

Grishin is enthusiastic about these efforts, 
particularly for vertebrates. But he thinks 
there are too many unknown species of inver- 
tebrate to sequence them all inthe near future. 
“1 don’t think they will succeed very quickly,” 
Grishin says. “Our efforts — where wejust jump 
right in and do things right away without much 
fuss about it — may be helpful.” 
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Some research institutes now pay 
for independent screening of their 
Scientists’ manuscripts. 
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By Alison Abbott 


n 15 June 2017, scientists at a 

respected biological institute in 

Germany were thrown into crisis 

by an alarming announcement. 

An investigation into the Leibniz 

Institute on Aging had found thatits 

director, cell biologist Karl Lenhard 

Rudolph, had published eight 

papers with data errors, including improperly 
edited or duplicated parts of images. 

Investigators didn’t find deliberate fraud, 

but Rudolph wasn’t able to present origi- 

nal data to explain the problems. The Lei- 

bniz Association, which runs the institute 

in Jena and had commissioned the probe, 

concluded that Rudolph hadn’t supervised 
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his lab group properly, and so was guilty of 
“grossly negligent scientific misconduct”. It 
applied the strictest sanctions it could, bar- 
ring the institute from applying for research 
funding from the association while under 
Rudolph’s leadership for three years. It also 
ordered the centre to undergo an interna- 
tional review, even though the last one had 
been completed only acouple of years earlier. 
Rudolph resigned as director. 

It was the second calamity in a year for 
the centre, which is also known as the Fritz 
Lipmann Institute (FLI). Police had raided itin 
2016 after allegations that the centre had vio- 
lated European regulations on animal experi- 
ments. The experiments were suspended, and 
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although the FLI was cleared of the allegations, 
notall of the experiments had been re-author- 
ized when the Rudolph affair broke. “The sec- 
ond crisis sent us into shock — it seemed more 
personal,” says molecular geneticist Chris- 
toph Englert, a group leader at the FLI, which 
employs 270 scientists. Most researchers at 
the centre hadn’t even known their director 
was under investigation. 

FLI leaders set about restoring the centre’s 
reputation. They began by phasing in manda- 
tory electronic databases and creating a sys- 
tem of thesis advisory committees to replace 
single PhD supervisors. The FLI’s head of core 
facilities, Matthias Gorlach, had a less con- 
ventional idea. He contacted Enrico Bucci, a 
molecular biologist who had visited the FLI 
for some PhD work 18 years earlier, and with 
whom he'd keptin touch. Bucci was nowinthe 
business of checking research papers, Gorlach 
knew; in 2016, he’d founded a science-integ- 
rity firm called Resis, based in Samone, Italy. 
Could the company perhaps help the institute 
to avoid errors in future? 

So began a remarkable system of outside 
vetting, in which researchers at the FLI must 
send every paper and master’s thesis across to 
Resis for screening before they submit them 
for publication. It’s an unusual step. Some 
journals check papers for errant statistics or 
manipulated images before publishing, but 
most research institutions say it’s up to the sci- 
entists themselves to ensure their manuscripts 
are correct. “Iam not aware of any US institute 
doing this,” says Lauran Qualkenbush, director 
of research integrity at Northwestern Univer- 
sity in Chicago, Illinois, and president of the 
US Association of Research Integrity Officers. 

And some researchers disapprove. “The 
moment an institution needs to constantly 
question the moral integrity of its scien- 
tists by double-checking submitted figures, 
the leadership should resign,” says Giulio 
Superti-Furga, director of the Research Center 
for Molecular Medicine in Vienna. 

But amid rising concern about the quality 
and reproducibility of research, particularly 
inthe biomedical sciences, a handful of Euro- 
pean institutions have told Nature that they 
have now hired external companies or ded- 
icated in-house experts to check research 
manuscripts. The institutions say the cost 
of the endeavour is worthwhile, not only for 
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the immediate benefit of the checks, but also 
because it can help them to spot areas in which 
their scientists need extra training. 
Scientists at the FLI and other institutions 
see the extra layer of checks as protective, not 
intrusive. “Because of the manuscript check, | 
sleep at night,” says one FLI group leader, Bjorn 
von Eyss. “I had started to worry about whether 
I had done something wrong in my papers, 
maybe missed a label: a mistake can become 
misconduct,’ adds Lilia Espada, a postdoctoral 
researcher at the centre. “Now that we submit 
to external checking, I have more confidence.” 


Science under scrutiny 


Across the research world, there is growing 
suspicion about sloppiness and outright 
misconduct in the scientific literature. The 
number of retractions of research papers has 
risen to around 1,400 a year, compared with 
about 40 at the turn of the millennium, notes 
Ivan Oransky, ajournalist in New York City who 
co-founded the website Retraction Watch, 
which monitors and reports on retractions. 

In 2016, Elisabeth Bik, a microbiologist then 
at Stanford University in California, reported 
that around 4% of more than 20,000 biomed- 
ical papers she had examined contained inap- 
propriately duplicated images. (Bik is nowa 
full-time research-integrity consultant.) And 
last year, Bucci reported that about 6% of a 
sample of 1,364 papers he had looked at con- 
tained at least one instance of image manip- 
ulation. 

Increasingly, fraud-busters are starting to 
hunt down manipulated images in published 
papers and flag them widely. Rudolph’s work 
is an example: the faults were exposed by an 
external whistle-blower, who sent the findings 
to Rudolph, then to the DFG, Germany’s main 
national funding agency, and to its independ- 
ent Ombudsman Commission. The Leibniz 
Association has declared a zero-tolerance 
approach, and young scientists at the FLI 
say they feel under pressure. Some have told 
Nature privately that they are worried because 
of the way in which even unintended errorsin 
papers are flagged publicly online. It can be 
easy to make a mistake when handling massive 
and complex biological data sets, they say — 
and they fear their papers might be publicly 
picked apart, derailing their careers before 
they get started. 

In this atmosphere, the idea of the sort of 
pre-submission screen that Bucci’s company 
was offering appealed to Goérlach. Bucci had 
been drawn into the world of research integ- 
rity after founding an image-search company 
called BioDigitalValley in Pont-Saint-Martin, 
Italy, in 2008 that aimed to sell a service to 
biomedical scientists who wanted all images 
relevant to a particular tissue or disease 
extracted from the literature. Bucci had first 
made a giant database of accessible biomedi- 
cal papers and cleared it of retracted articles. 
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He then checked the images in all publications 
by the authors of those retracted papers. He 
found serious problems in the work of many 
of them, particularly that of Alfredo Fusco, a 
then-prominent cancer researcher at the Uni- 
versity of Naples Federico II. Fusco has now 
had 24 papers retracted and 10 corrected. The 
affair, which implicated scientists in Fusco’s 
network at other institutes in Italy and beyond, 
sent shock waves through the scientific com- 
munity. Bucci was so disturbed by what he saw 
that he switched career path, founding Resis, 
to try to do something about it. 


Restoring reputation 


After Gorlach contacted him, Bucci gave 
the FLI’s group leaders a presentation of his 
work. His company’s proprietary software 
scans images in a manuscript for duplication 
or unlikely composition, he told them. Resis 
has just two employees, but brings in consult- 
ants for particular contracts. In late 2017, FLI 
group leaders sent Bucci some sample papers 
and theses to check — and were impressed by 
the results. He picked up some small errors 
they hadn't spotted. The institute signed a 
contract with Resis to analyse the images in 
all papers, to do random checks on statistics 
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TIS VERY IMPORTANT 
FOR OUR IMAGE AS 

AN INSTITUTE TO GET 
BACK ON TRACK." 


and also, in master’s theses, to look for plagia- 
rism. Resis screens all manuscripts within 24 
hours of receipt, although if the screen flags 
problems, further analysis can take up to 
three more days. The institute budgets up to 
€50,000 (US$55,000) per year to cover both 
the service and its handling of the information 
that Resis supplies. 

The new system began in April 2018, and 
the first results proved its value, says molec- 
ular geneticist Alfred Nordheim at Germa- 
ny’s University of Tubingen, who became the 
FLI’s interim scientific director when Rudolph 
stepped down. Resis found no serious prob- 
lems in the first 40 manuscripts that it ana- 
lysed for the institute, but it did flag at least 
oneissuein17 of them, Nordheim says. “Most 
of these issues were to do with the use of statis- 
tics — things like undersampling or use of not 
fully suited statistical procedures,” he says. 
“The Resis analysis has been important for us 
because it allowed us to identify patterns of 
errors, and act accordingly.” Now, for exam- 
ple, the institute has introduced mandatory 
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statistical workshops for all of its scientists. 

FLI researchers see the system as a positive 
step that is helping to protect them from error. 
Rudolph himself says that had the checking 
system had been in place earlier, he would have 
caught the problems in his papers. (Five have 
beencorrected, one remains under discussion 
at a journal, and in two cases, journal editors 
decided no correction was needed, he says.) 
Rudolph remains a lab leader at the FLI, but 
his group has now shrunk to seven scientists, 
half the size it was before the scandal broke. 

In June this year, Marco Foiani, the scien- 
tific director of IFOM, a molecular oncology 
research institute in Milan, Italy, learnt about 
the initiative during a meeting of the FLI’s 
international scientific advisory board, of 
which he is amember. It struck an immediate 
chord with him: IFOM was itself reeling from 
research misconduct investigations involving 
a former director, Pier Paolo Di Fiore, who had 
co-authored some papers with Fusco that have 
been retracted. Di Fiore says he agrees with 
the retractions, but wasn’t involved in putting 
the figures together for the papers. IFOM had 
introduced electronic notebooks and other 
measures to promote good scientific practice, 
and Foiani decided to add on external check- 
ing, also using Resis. “It is very important for 
our image as an institute to get back ontrack,” 
says Foiani. 

AsattheFLI, young researchers at IFOM wel- 
come the screens. “Having a research scandal 
canaffect the credibility ofthe whole institute,” 
says YIli Doksani, one of IFOM’s 24 research 
group leaders. “We are mostly funded by a 
charity, and 1 am happy if the institute does 
whatever is needed to maintain trust and show 
we take integrity issues very seriously.” 

Other organizations have decided to do 
publication checks internally. After the 
Beatson Institute in Glasgow, UK, had to deal 
with a retraction in 2012, it hired a dedicated 
integrity offer, former molecular biologist 
Catherine Winchester, to check all papers 
destined for publication by eye. “It took only 
a short time for the more junior scientists to 
shed their fear that they were being policed, 
but there was immediate buy-in from senior 
Pls,” she says. “Now everyoneis really grateful 
for the service.” 


The cost of checks 


Some research organizations rule out exter- 
nal checks for themselves. The president 
of Germany’s Max Planck Society, Martin 
Stratmann, says that the society — whichruns 
78 elite research institutes — does not need 
to commission outside checkers because 
research directors themselves have the man- 
date and responsibility to check every paper 
before it goes out. Some institutes Nature 
talked to for this story were unwilling to 
comment on the topic; others said only that 
they found it interesting. “We will monitor 
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the process and discuss with our faculty,” says 
Bruce Stillmann, director of the Cold Spring 
Harbor Laboratory in New York. 

Nor doall institutes that have been hit by 
a scandal see the need for screening. In 2012, 
the DFG judged that Silvia Bulfone-Paus, a 
senior scientist at another Leibniz institute, 
the Research Center Borstel, had failed in her 
supervisory duties after data manipulation 
was discovered in more than a dozen of her 
papers. Centre director Stefan Ehlers doesn’t 
think that paying for independent checks isthe 
right way to approach these problems: rather, 
he says, it’s important to foster “a culture of 
trust and fearlessness to report mistakes and 
to discuss questionable data”. 

And pre-submission checks wouldn't stop 
all types of fraud, adds Shinya Yamanaka, a 
Nobel laureate and director of an institute 
that has recently experienced such a case, 
the Center for iPS Cell Research and Appli- 
cation at Kyoto University in Japan. There, in 
2018, stem-cell researcher Kohei Yamamizu 
was found guilty of fabricating and falsifying 
images in a high-profile paper in Stem Cell 
Reports. Yamanaka implemented measures 
suchas electronic notebooks and mandatory 
storing of all experimental data — but did not 
opt for pre-submission checks, amethod that 
“does not investigate whether experiments 
were truly carried out and recorded appropri- 
ately”, he told Nature in an e-mail. 

Still other institutions say that the 
checks are beyond their budget. The Italian 
National Research Council (CNR), which runs 
102 research institutes, would like to offer a 


full — but voluntary — screening service to its 
institutes, but says it can’t afford to. After the 
Fusco affair, it established a technical unit to 
use licenced Resis image-analysis software to 
check published papers. The unit provided for- 
mal comments on the report of the University 
of Naples’ investigation into Fusco’s papers, 
and now focuses on allegations of misconduct 
by CNR researchers. If an allegation surfaces, 
the unit examines all the papers the institute in 
question has published over the previous five 
years. Any manipulated images are recorded 
in a growing database. 

Last year, the CNR unit started preventive 
work ona modest scale: it has so far donea few 
pre-submission checks, responding to indi- 
vidual CNR researchers who were concerned, 
for example, about joining in as co-authors 
on particular manuscripts. “Prevention is the 
critical step,” says Cinzia Caporale, who leads 
the organization’s research integrity activi- 
ties from its headquarters in Rome. After 
the scandals in Italy, “scientists don’t always 
trust their colleagues any more’, she says. 
Caporale thinks the CNR’s work has increased 
scientists’ awareness: the council’s database 
suggests that its scientists are already pub- 
lishing fewer inappropriate images, she says. 
A higher budget would allow more systematic 
pre-checking, but Caporale says there is no 
prospect of that right now. 

Not many image-checking services have 
the capacity to rapidly screen a high volume 
of papers, as an institute — or ajournal — might 
require. But some say they are interested. 
Sheridan, a large publishing-services firm in 
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Hunt Valley, Maryland, already offers image 
forensics tojournals, and told Nature that itis 
“opento the idea’ of setting up sucha service 
for institutions. Mike Rossner, whorunsasmall 
consultancy firm called Image Data Integrity 
in San Francisco, California, says he’d prefer to 
train someone from an institution’s research 
integrity office to do screening using his own 
manual system. Rossner is known for his exper- 
tise in spotting problems in papers by eye: asa 
former managing editor at the Journal of Cell 
Biology, he introduced checks of images in all 
papers accepted for publication — making the 
journal the first major life-sciences publication 
to institute the practice. 


Nurturing trust? 


Rossner thinks that investing in pre-checking 
could save money in the long run. “Prophylac- 
tic screening makes financial sense, because 
any case brought against an institution for 
publishing misleading data could cost an insti- 
tute even more in legal fees,” he says. It might 
even become a selling-point for institutes, 
suggests Caporale. “Being able, for instance, 
to tell journal editors that a paper has been 
independently checked may nurture trust,” 
she says. 

Even if that were true, it wouldn’t relieve 
journals of the responsibility to do their own 
checking, says Bernd Pulverer, chief editor of 
the EMBO Journal in Heidelberg, Germany. His 
journal checks images in all papers before they 
are accepted, and generally sees problems in 
around onein five manuscripts, a proportion 
that has not changed since the journal began 
the checks ten years ago, he says. Only a tiny 
minority (0.5%) of these involve outright fraud. 
Other journals now regularly check images 
too, although some (including Nature) do spot 
checks, not systematic ones. 

Journals don’t have the same jurisdiction as 
ascientist’s employer does to investigate prob- 
lems, so the institute has animportant role in 
ensuring quality, Pulverer adds. “Butitis impor- 
tant forthe employer not to start over-policing, 
because that can backfire,” he says. 

The FLI plans to continue working with 
Resis and thinks that the checks will make the 
institute more attractive in competing for the 
best scientists, says Nordheim. In June 2018, 
it reported its experience to a Leibniz Associ- 
ation leadership meeting on good scientific 
practice. Matthias Kleiner, the association’s 
president, was impressed. He is planning to 
test the possibility of introducing a certifica- 
tion system for good scientific practice for 
the association’s institutes. It’s possible that 
pre-submission checks could be an optional 
item on these certificates. For some Leibniz 
institutes, he adds, it could bea way “to protect 
scientists from being in danger of scientific 
misconduct”. 


Alison Abbott writes from Munich, Germany. 
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Feature 


Acomposite image of interstellar Comet 21/Borisov, taken by the Hubble Space Telescope. 


INTERSTELLAR 
INTRUDERS 


Astronomers grapple with the meaning of 
the first two objects entering our Solar System 
from distant regions. By Alexandra Witze 


rom the tallest peak in Hawaii to a 
high plateau inthe Andes, some of the 
biggest telescopes on Earth will point 
towards a faint smudge of light over 
the next few weeks. The same patch of 
sky will draw the attention of Gennady 
Borisov, an amateur astronomer in 
Crimea, and many other hobbyists 
who will sacrifice proper sleep and doze 
through their day jobs rather than miss this 
golden opportunity. 


434 | Nature | Vol575 | 21 November 2019 


What they’re looking for is a rare visitor 
that is about to make its closest approach to 
the Sun. After that, they have just months to 
grab as much information as they can from 
the object before it disappears forever into 
the blackness of space. 

This chunk of rock andice started its journey 
many light years from Earth, millions of years 
ago. The object got kicked out of its own neigh- 
bourhood by a violent gravitational push — 
maybe from a nearby planet, maybe froma 
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passing star. Since then, it has been adrift inthe 
space between the stars, eventually heading 
in our direction. 

On 30 August, Borisov spotted the object 
in the predawn sky — it was glowing dimly, 
with a broad stubby tail. Later named Comet 
21/Borisov after its discoverer, it captured 
global attention because it’s only the second 
object — aside from exotic dust particles — ever 
known to have entered our Solar System from 
interstellar space. “This is my eighth comet, 
and soamazing,’ says Borisov, who adds that it 
was “great luck that I got sucha unique object”. 

It is remarkably different from the first 
interstellar interloper, which was a small, dark, 
rocky-looking object named 11/(Oumuamua 
that whizzed past the Sun in 2017. Together, 
these two interstellar objects are rewriting 
what researchers know about the icy bodies 
— estimated to number as many as 10° — that 
float unmoored throughout the Milky Way. 

Among other things, 1I1/(Oumuamua and 
21/Borisov have provided the first direct 
glimpse of the physics and chemistry of the 
squashed debris clouds that surround young 
stars and serveas the birthing grounds for plan- 
ets. These samples from other planetary sys- 
tems are allowing scientists to explore whether 
the Solar System is unique or whether it shares 
building blocks with other planetary systems 
inthe Milky Way. 

Because astronomers spotted 21/Borisov 
on its way into the Solar System, they have 
many months to study it — unlike their fleet- 
ing glimpse of ‘Oumuamua, which was discov- 
ered onits way out. As a result, they expect to 
learn much more from 21/Borisov, such as what 
chemical compounds make up its icy heart. It is 
their best look yet at an object known to have 
formed around another star. 

And as telescopes continue to probe the 
sky for faint, fast-moving objects, researchers 
expect that they will spot many more inter- 
stellar interlopers in coming years. “It’s been 
so much fun to see this suddenly crack open 
and watch a new field develop,” says Michele 
Bannister, a planetary astronomer at Queen’s 
University Belfast, UK. 


Dusty origins 

Interstellar objects probably began their lives 
when icy grains clumped together in a disk of 
gas and dust around a young star. These are 
the same regions where planets grow from 
small nuclei and then ping-pong into different 
orbits around the star because of collisions 
and gravitational shoves. 

The planets push through the icy rubble like 
asnowplough shouldering its way through a 
pile of hailstones. And modelling results sug- 
gest that the planets fling more than 90% of 
those ‘hailstones’ out of their star’s sphere of 
influence and into interstellar space. There 
they drift, as lonely scattered objects, until they 
happen to pass close enough to another star 
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to be attracted by its gravity for a quick visit. 

Astronomers had expected that the first 
interstellar object they saw would look like 
a typical comet. Most comets in the Solar 
System hail from the distant realm known as 
the Oort cloud, a sort of cosmic deep freeze 
that lies roughly 1,000 times farther away 
from the Sun than Pluto. Occasionally, some- 
thing perturbs one of these comets and sends 
it careering towards the Sun; as it gets closer 
and warms up, its nucleus sprays out dust and 
gas that form aclassic cometary tail. 

But when the first interstellar visitor showed 
up, it didn’t look like a conventional comet. 
Unlike them, ‘Oumuamua was tiny — just 
200 metres or so across — and rocky. Also, it was 
shaped like a cigar and tumbling end over end. 
That’s aboutall scientists could work out before 
‘Oumuamua headed out of the Solar System’. 

By contrast, 2I/Borisov looks like an ordinary 
comet — and researchers are taking advantage of 
their time to study it (see‘Dropping by’). “Weare 
keenly interested in seeing whatthe chemistry of 
this cometis, tosee ifitis different from thosein 
the Solar System,” says Karen Meech, anastrobi- 
ologist at the University of Hawaii in Honolulu. 

21/Borisov is reddish in colour and is stead- 
ily spraying out dust particles”. Its nucleus 
is relatively small, perhaps just one kilometre 
across, but that’s not unheard of for Solar 
System comets. 

“After ‘Oumuamua, we had to completely 
revise what we thought interstellar objects 
might be like,” says Matthew Knight, a comet 
specialist at the University of Maryland in 
College Park. “But nowthe second one coming 
through looks more or less, so far, like what we 
thought we might see from a comet ejected 
from another star. Now| feel a lot better.” That 
suggests that the star systems where other 
worlds form might be much like our own. 

The discoveries are coming fast. Just three 
weeks after 21/Borisov was first seen, astron- 
omers trained the 4.2-metre William Herschel 
Telescope in Spain’s Canary Islands on it and 
spotted molecules of cyanide gas streaming 
off the comet’. It was the first-ever detection 
of gas from analien visitor to the Solar System. 

On 11 October, another research team used 
a3.5-metre telescope in New Mexico to detect 
oxygen coming off the comet’. The oxygen 
probably came from water breaking apart inthe 
comet’s nucleus, making this the first time that 
researchers have spotted water from another 
star system entering our own. Together, the 
amounts of cyanide and water spraying from 
the comet aren't surprising compared with what 
astronomers have seen from many other bodies. 

Astronomers are watching keenly to 
see what other molecules, such as carbon 
monoxide, they can spot coming off 2I/Borisov 
as it gets closer to the Sun and warms up, 
which will further reveal how similar — or how 
different — it isto comets inthe Solar System, 
says Maria Womack, an astronomer at the 


DROPPING BY 


Hailing from interstellar space, Comet 2I/Borisov is making a quick pass through 
the Solar System. Its trajectory will carry it closest to the Sun in early December. 
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Despite its exotic origin, 
observations so far suggest 
Comet 2I/Borisov is much like 


other comets in the Solar System. 
*Position as of 12 October 2019. 


Florida Space Institute at the University of 
Central Florida in Orlando. 

Early observations also suggest that 
21/Borisov might contain relatively low 
amounts of carbon-chain molecules suchas C, 
and C, (ref. 6). About 30% of the comets inthe 
Solar System are similarly carbon-depleted. 
They typically come from relatively close to 
the Sun, rather than from the far reaches of 
the Oort cloud. 

As months pass and astronomers gather 
more observations of 2I/Borisov, they hope to 
be able to understand much more about the 
planet-forming disk where it originated. “It’s 
going to be really exciting to figure out what 
the building blocks of other systems are going 
to look like relative to ours,’ says Malena Rice, 
a graduate student in astronomy at Yale Uni- 


versity in New Haven, Connecticut. 

Researchers also hope to start unravelling 
how interstellar objects might have voyaged 
through deep space before showing up inthe 
Solar System. Estimates suggest the objects 
experience many forces as they orbit the centre 
of the Galaxy, including occasional encounters 
with other stars or nudges from Galactic tides. 
Some scientists have tried to calculate which 
stars 1I/(Oumuamua and 2I/Borisov could have 
formed around, but tracing their orbits back 
is difficult’ — like trying to reconstruct which 
bar aLondon pub-crawler started at from the 
final one they visited. 

Other questions include when we can 
expect the next interstellar visitor, and how 
different it might be from 1I/(Oumuamua and 
21/Borisov. Scientists didn’t expect twoinsuch 
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rapid succession after decades of fruitless 
searching. “I remain confused and astounded 
that the second object came along so fast,’ says 
Robert Jedicke, an asteroid specialist at the 
University of Hawaii, who has worked to cal- 
culate the frequency of interstellar visitors®. 
“They’re like buses,” says Alan Fitzsimmons, 
an astronomer at Queen’s University Belfast. 
“You wait decades for one to come along, and 
then two come along almost at once.” 

Some astronomers are now poring through 
archival data to see whether objects spotted 
years ago were actually interstellar visitors 
that researchers did not recognize at the time. 
And the future rate of discovery is expected to 
rise — perhaps to one interstellar object a year 
— when the Large Synoptic Survey Telescope 
goes online in Chile in 2022, from where it will 
survey the entire visible sky every three nights. 
The European Space Agency has been work- 
ing ona spacecraft concept, known as Comet 
Interceptor, that could visit future interstellar 
objects as they wing their way past the Sun. 

Once astronomers have 10 or 20 interstellar 
objects under their belts, they should have a 
much better picture of what these deep-space 
wanderers are really like. “Eventually we'll be 
talking about the Galaxy as something in which 
we are exchanging the products of planetary 
systems,” says Bannister. “It will be an entirely 
different way of doing astronomy.” 


Alexandra Witze writes for Nature from 
Boulder, Colorado. 
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Science in culture 


Books & arts 


Far sight and flotsam 


Two books explore how humansare both probing 
and polluting outer space. By Meg Urry 


he space age erupted with a flurry of 

satellites. The first two Soviet Sputniks 

launched in1957, soon followed by the 

US Explorer] and Vanguard I. 1n 1959, 

spurred on by cold-war tensions, 
NASA selected seven men as astronauts for 
its Project Mercury programme. (Thirteen 
women who passed the same hurdles, courtesy 
ofa private, parallel programme, were vetoed.) 
Barely a decade later, NASA's Apollo astronauts 
walked on the Moon. 

The dawn of space exploration was all about 
the now. It was daring and risky, punctuated 
by engineering miracles and an air of invin- 
cibility. It wasn’t, however, focused on the 
long term. Now, six decades on from Sput- 
nik, old spacecraft are displayed in museums, 
robotic missions regularly reveal secrets from 
throughout the Universe and private compa- 
nies such as SpaceX are planning colonies on 
Mars. And the rich, crowded future of space 
is the focus of two books, one by space scien- 
tist and oceanographer Kathryn Sullivan, the 
other by space archaeologist Alice Gorman. 
Both make us think more deeply about how 
we, as humans, ought to fit into the cosmos. 

In Handprints on Hubble, Sullivan, a for- 
mer NASA astronaut who helped to launch 
the Hubble Space Telescope in 1990 and has 
been involved in updating its capabilities 
since, highlights the importance of planning 
for new instruments and infrastructure. Gor- 
man, meanwhile, applies an archaeologist’s 
perspective to space-related materials and 
activities in Dr Space Junk vs the Universe. 

Hubble has made more than one million 
observations of stars and galaxies, and 
probed dark matter and the history of the 
Universe itself, over nearly three decades. I 
worked for many years at the Space Telescope 
Science Institute in Baltimore, Maryland, 
which runs Hubble’s science operations for 
NASA. I was there for the launch, the discov- 
ery of a flaw in the primary mirror and the 
astonishing fixes that astronauts repeatedly 
pulled off. Yet Sullivan’s book makes clear 
how muchI hadn’t known. Hers isa first-hand 
story, from conception to today, of the first 
space mission for which in-orbit maintenance 
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Dr Space Junk vs The Universe: 
Archaeology and the Future 
Alice Gorman 

MIT Press (2019) 


Handprints on Hubble: An 
Astronaut’s Story of Invention 
Kathryn D. Sullivan MIT Press (2019) 


and repair were integral from the start. 

Sullivan brings alive the strenuous chal- 
lenges of space mechanics. Replacing entire 
instruments or — much harder — parts deep 
inside them during long, arduous spacewalks 
demands custom-designed tools. For exam- 
ple, Sullivan explains the evolution of the foot 
anchors that keep astronauts in place. Without 
these, turning a screw one way would make the 
astronaut and/or the spacecraft rotate in the 
opposite direction. This is the kind of detail 
that underscores the complexity of the job. 

Every step needs forethought. Once 
removed, ascrewwill float away if not caught, 
creating dangerous space junk that could 
damage other craft, as Gorman discusses. 
Sullivan and her colleagues spent hundreds 
of hours testing tools and procedures ona 
simulated Hubble in an underwater tank, with 
scuba-diving gear standing in for unwieldy 
spacesuits. 

The meticulously planned servicing 
missions are what have kept Hubble at the 
leading edge. Its first set of instruments was 
selected in 1978. By today’s standards, the 
technology was impossibly crude and the com- 
puter storage limited. After the first servicing 
mission in 1993, new optics compensated for 
the flaw in the mirror, the workhorse camera 
had improved detectors and the spacecraft 
had newsolar panels and other vital infrastruc- 
ture. There have been four more such missions. 
The ultraviolet spectrograph (COS) installed in 
2009 is up to 20 times more sensitive than the 
previous ones. That is equivalent to increas- 
ing the mirror diameter from 2.4 metres (the 
largest that could fit inside the Space Shuttle 
bay) to more than 10 metres (larger than any 
telescope yet built). 

Like Sullivan, Gorman was fascinated by 
spaceas a child, inspired by dark skies over the 
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Australian countryside. But incommon with 
many women at the time, she was discouraged 
from becoming anastrophysicist. Instead, she 
earned a PhD in archaeology and worked asa 
consultant documenting Indigenous herit- 
age sites in her home country. But her gaze 
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Astronaut Michael Good works on maintaining the Hubble Space Telescope. 


frequently turned skywards. Eventually, she 
applied her training to space exploration, 
regarding even the lowliest “space junk” as 
animportant part of the historical record. 
As the “Dr Space Junk” of her book’s title, 
Gorman writes about how we should protect 


our space legacy. She describes the reef of 
orbital detritus around our planet, including 
satellites, alive and dead, embedded inasea 
of discarded hardware and debris from space 
collisions (deliberate and otherwise), as well 
as receding planetary probes and equipment 
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abandoned on the Moon. 

She draws parallels between archaeological 
investigations that find ancient artefacts on 
Earth and missions to catalogue objects that 
are merely decades old circling above it. At 
times, her ideas can seem fanciful — as when 
she discusses shadows left by footprints on 
the lunar surface, or speculates about future 
civilizations finding spacecraft beyond the 
Solar System. But for the most part, the book 
made me think fresh thoughts. 

Gorman reminds us how fragile our access 
to space is. Orbital debris alone poses a risk to 
every newly launched spacecraft. She warns of 
the need for nations to cooperate in preserv- 
ing and protecting the space environment, and 
points out the moral responsibility of space- 
faring nations to deal on an equal basis with 
those that are not. 

Both Sullivan and Gorman envision a future 
in which astronauts, and possibly ordinary 


“Both Sullivan and Gorman 
envision a future in which 
astronauts live and work in 
space regularly.” 


citizens, live and work in space regularly. In 
that world, it will be normal to site telescopes 
in stable orbits at L2 (the second Lagrange 
point, which circles the Sunin tandem withthe 
Earth-Moon system) and to upgrade them reg- 
ularly. Hubble has seen a few monster galaxies 
as they were early in the evolution of the Uni- 
verse. In future, more sensitive instruments, 
such as the James Webb Space Telescope, 
might see much smaller early galaxies and 
possibly even the first stars. 

To read these two books is to marvel at 
what we have achieved in our nascent efforts 
to inhabit space, and to recognize that we 
have barely begun that quest. Many popular 
treatments of space travel, including the films 
Apollo 13 (1995) and First Man (2018), have 
framed it as competitive derring-do. Sullivan 
and Gorman focus more on our common inter- 
ests, as humans, in knowledge and coopera- 
tion. They invite us to think anew about the 
legacy and the future of space. 


Meg Urry is Israel Munson professor of 
physics and astronomy at Yale University, and 
director of the Yale Center for Astronomy and 
Astrophysics in New Haven, Connecticut. She 
uses the Hubble, Chandra and Spitzer space 
telescopes in her research on black holes. 
e-mail: meg.urry@yale.edu 


Nature | Vol575 | 21November 2019 | 437 


Books & arts 
rrr 


A person with HIV in the Mae Tao Clinic in Thailand. 


How stigma subverts 


public health 


A hard-hitting study exposes the devastating effects 
of shame and discrimination. By Julie Pulerwitz 


sa public-health researcher working 

on HIV around the globe, I have seen 

the devastation that stigma can cause. 

It leads to people being shunned and 

isolated, and discriminated against 
in health care, at work and at school. And 
it inhibits them from accessing life-saving 
services and medications. As a new book by 
medical anthropologists Alexandra Brewis and 
Amber Wutich convincingly argues, stigma 
strips people of dignity and exacerbates the 
already-difficult circumstances of the poorest 
and most vulnerable. It can itself have major 
impacts on health, such as depression and 
even suicide. 


438 | Nature | Vol575 | 21 November 2019 


Brewis and Wutich work in low-income 
countries. Their book’s title, Lazy, Crazy, and 
Disgusting, highlights their areas of focus: 
obesity, mental illness and community 
sanitation. The authors focus on detailed, 


Lazy, Crazy, and 
Disgusting: Stigma 
and the Undoing of 
Global Health 
Alexandra Brewis & 
Amber Wutich, 
Johns Hopkins 
University Press 
(2019) 
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qualitative case studies in diverse arenas. 
These demonstrate three things: how stigma 
arises and affects the most marginalized; 
why stigma is so difficult to combat; and 
how public-health efforts can unwittingly 
fuel it. 

It is this third issue — the unintended 
consequences of big campaigns — that forms 
their main argument. And itis a rarely heard 
and compelling one. When, for example, the 
US public-health community framed smok- 
ers as putting others at risk of getting can- 
cer from ‘second-hand’ smoke, the messages 
hit home. Social norms regarding whether it 
was acceptable to smoke changed, and many 
smokers were motivated — and managed — 
to quit. But there were also negative con- 
sequences. Smokers were blamed for their 
addiction, and people with smoking-related 
diseases (even those who had never smoked) 
were often castigated as bringing their con- 
ditions on themselves. Meanwhile, tobacco 
companies escaped criticism. 

Similarly, the authors demonstrate how 
concerns about the effects of obesity — such 
as diabetes and cardiovascular disease — have 
led to fat-shaming, depression and more. Yet 
obesity is strongly linked to socio-economic 
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circumstances, and a lack of access to 
high-quality foods suchas fresh fruit and veg- 
etables. It is counterproductive and unfair to 
blame individuals. As obesity and associated 
conditions become increasingly prevalent 
across the global south, Brewis and Wutich 
caution against a snowball effect of harmful 
messaging and impacts. 

The book is less strong on ways forward. 
I second the authors’ calls for increased 
awareness among health practitioners, for 
tracking of stigma levels and for policy to 
be evidence-based. But, in my view, a more 
comprehensive and nuanced response 
is needed. There are important distinc- 
tions between, for example, public-health 
measures to reduce people’s internalized 
feelings of blame and shame, and legislative 
efforts to minimize ‘enacted’ stigma — that 
is, instances of discrimination. Internalized 
stigma might lead to depression, and those 
who experience it might benefit from coun- 
selling, say. By contrast, human-rights abuses 
must be countered with anti-discrimination 
policies and laws. 

Moreover, Brewis and Wutich fail to explore 
an important concept: intersecting stigmas. 
For example, a person with HIV who works in 
the sex industry and injects drugs might expe- 
rience compounded bias and discrimination. 
The authors use HIV as an example ofa success 
story in which concentrated efforts from the 
global health community, suchas health poli- 
cies and mass-media campaigns, have greatly 
reduced stigma. 

But this is true only in some communities 


“Isecond the authors’ calls 
for increased awareness 
among health practitioners.” 


— especially in high-income countries where 
living with HIV has been transformed into 
a chronic illness through the use of anti- 
retroviral medications. These medications 
are often not reaching the most vulnerable, 
and in many contexts — for example, where 
drug users are criminalized and struggle to 
access health care — intersecting stigmas 
remain rampant. 

This engaging book nevertheless fills a 
significant gap in the literature by providing 
a wake-up call to scholars and practitioners 
unfamiliar with the topic. And it reminds me 
that we should all be working together to avoid 
any unintended consequences of promoting 
health. 


Julie Pulerwitz directs the HIV and AIDS 
programme at the Population Council, an 
international non-profit organization based 
in New York City. 

e-mail: joulerwitz@popcouncil.org 


Books in brief 


Claudia Hammond 


The Art of Rest 

Claudia Hammond Canongate (2019) 

In 2014, journalist Claudia Hammond, presenter of BBC Radio 4’s Allin 
the Mind, joined a group studying rest at London’s Wellcome Collection. 
She proposed a radio survey called the Rest Test. Responses from 
18,000 people in 135 countries yielded a top ten of restful activities, and 
they inspire the titles of her informative chapters interlacing findings 
from dozens of studies. Intriguingly, the top five are largely solitary. 
Number one is reading, which “not only allows us to escape other 
people, but simultaneously provides us with company”, she notes. 


WRTIFICIAL 
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Al AND THE 
FUTURE OF 
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Artificial You 

Susan Schneider Princeton University Press (2019) 

Artificial intelligence (Al) technology will raise increasingly difficult 
ethical issues, argues philosopher, cognitive scientist and self- 
confessed technotopian Susan Schneider in this demanding 
dialogue between philosophy and science. How would you feel, 
she begins speculatively, about purchasing a “Hive Mind” — a brain 
chip permitting you to experience the innermost thoughts of your 
loved ones? That presumes, however, that future Al can capture 
consciousness with computation — which she argues is unlikely. 


ARE MEN 
ANIMALS? 


HOW MODERN 
MASCULINITY 


SELLS MEN SHORT 


MATTHEW GUTMANN 


Are Men Animals? 

Matthew Gutmann Basic (2019) 

Anthropologist Matthew Gutmann has spent 30 years exploring 
concepts of masculinity across the United States, Latin America 
and China. “We place unreasonable trust in biological explanations 
of male behaviour,” he argues in this wide-ranging book, which 
discusses US mass killings by men, Donald Trump's presidency and 
much more. Yet, he contends, there have been no major discoveries 
of a link between testosterone and aggression since 1990, despite a 
boom in scientific articles on the topic. 


When the Earth Had Two Moons 

Erik Asphaug Custom House (2019) 

The days of the week are named after bodies in the Solar System and 
a diverse mix of Norse and Roman deities. So notes Erik Asphaug, 

a planetary scientist who is part of the team behind two lunar and 
planetary NASA missions. But if the planets were born out of material 
orbiting the Sun, like raindrops condensing from a cloud, why do they 
differ so much in structure and chemical composition? This detailed 
book assesses the astronomical and geological evidence on the 
origin of planetary diversity. 


A JOURNEY ALONBSP” 
ACROSS GANADA'S AROTIC 


Beyond the Trees 

Adam Shoalts Allen Lane (2019) 

In 1967, the centennial of Canada’s confederation, ten teams of 
canoeists paddled from central Alberta to Montreal. In mid-2017, to 
mark the 150th anniversary, explorer, historian and geographer Adam 
Shoalts travelled across the Canadian Arctic by canoe and on foot. 
His journey took him from the Yukon to Nunavut: across the terrestrial 
world’s largest expanse of wilderness outside Antarctica. It proved 
less stressful than his normal “modern, hyper-connected world”, he 
avers in his engaging, hazard-strewn account. Andrew Robinson 
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Setting the agenda in research 


Comment 


Acollapsed building in the city of Palu in Sulawesi, Indonesia, after a magnitude-7.5 earthquake hit the region in September 2018. 


Disaster-zone research 
needs acode of conduct 


JC Gaillard & Lori Peek 


Study the effects of 
earthquakes, floods and 
other natural hazards 
with sensitivity to ethical 
dilemmas and power 
imbalances. 
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magnitude-7.0 earthquake rocked 
Anchorage, Alaska, in late November 
2018. Roads buckled and chimneys 
tumbled from rooftops. Business 
operations were disrupted. Schools 
were damaged across the district. This was 
the largest earthquake to shake the regionina 
generation, and there was much to learn. What 
was the state of the infrastructure? Might fur- 
ther quakes occur? How did people respond? 
Teams of scientists and engineers from across 
the United States mobilized to conduct field 
reconnaissance in partnership with local 
researchers and practitioners. These efforts 
were coordinated through the clearing house 
set up by the Earthquake Engineering Research 
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Institute in Oakland, California, which pro- 
vided daily in-person and online briefings, as 
well as a web portal for sharing data. 

But researchers are not always so 
welcome in disaster zones. After the deadly 
Indian Ocean earthquake and tsunami on 
26 December 2004, hundreds of academics 
from countries including Japan, Russia, France 
and the United States rushed to the region 
to collect perishable data. This influx of for- 
eign scientists angered and fatigued some 
locals; many declined researchers’ requests 
for interviews. The former governor of Aceh 
province, Indonesia, where more than 128,000 
people died, described foreign researchers 
as “guerrillas applying hit-and-run tactics”). 


HARIANDI HAFID/SOPA/ZUMA WIRE 


Yet research on tsunami propagation and peo- 
ple’s response to the event has led toimproved 
warnings and emergency-response plans. 

When, on 28 September 2018, an earth- 
quake and tsunami hit the Indonesian island of 
Sulawesi, dozens of researchers found them- 
selves unable to enter the country”. Indonesian 
law now requires foreign scientists to obtain 
a special visa before they can begin research. 
Data-collection protocols must be submitted 
to the government in advance and projects 
must have an Indonesian partner. Violators 
could face criminal charges and even prison. 

This incident has inflamed a smouldering 
debate among disaster researchers. Some 
scholars argue that stringent administrative 
protocols violate researchers’ rights and 
prevent the collection of crucial, potentially 
life-saving, data*. Others counter that such 
procedures protect survivors and preserve the 
integrity of local scientific efforts. For instance, 
concerns over studies placing undue burdens 
on overwhelmed groups — including grieving 
schoolchildren — led New Zealand to impose 
amoratorium on social-science research after 
the 2011 Christchurch earthquake’. 

Here we argue that disaster research needs 
aculture shift. As in other branches of study 
involving human participants, ethical concerns 
should have the same primacy as research 
questions’. We call on the United Nations 
Office for Disaster Risk Reduction (UNDRR) to 
put forward a researcher-driven ethical code 
of conduct. This should advance disaster 
research, making it scientifically rigorous as 
wellas locally and culturally grounded. After all, 
the UNDRR has a mandate “to ensure synergies 
among... regional organizations and activities 
in socio-economic and humanitarian fields”. 


Moral hazard 


Researchers working in disaster zones, with 
people whose culture might be different from 
their own, need to know how to interact with 
survivors as well as local officials and scholars, 
without adding to those people's problems. 
There is no universal definition of ethical 
behaviour, and only a handful of countries have 
ethically informed guidelines for post-disaster 
research. In New Zealand, guiding principles 
from the Natural Hazards Research Platform 
advise that researchers must “avoid creating 
unnecessary anxiety by speculating to locals”. 
The Philippines allows research onthetrauma 
caused by disasters only in exceptional cases, 
such as when affected people want to share 
their feelings as a way to process the event. 
Brazil, like Indonesia, requires all researchers 
working in the country to have a special visa 


and an established local connection. 
University ethics committees and national 
ethical review boards are unable to fill the 
gap. They tend to focus on studies in medi- 
cine and social sciences that involve human 
participants. They have little to say on how 
to investigate a collapsed building or a com- 
promised coastal landscape. Yet studies by 
engineers or natural scientists have partici- 
pants, too: local residents, scholars, guides 
and interpreters. Tsunami researchers might 


“Ethical concerns should 
have the same primacy as 
research questions.” 


need to ask coastal dwellers about the height 
of waves; structural engineers assessing a col- 
lapsed stairwell might question the building’s 
occupants about how they escaped. 


Towards a code of conduct 


Researchers equipped with an ‘ethical toolkit’ 
are better able to help affected popula- 
tions® without causing harm. Following the 
earthquake that struck Luzon island in the 
Philippines in April this year, research was coor- 
dinated by academics based in nearby Manila. 
They provided support deemed appropriate by 
those affected. A code of conduct could build 
on such successes and should consider the 
following three principles. 


Have a clear purpose. Researchers should 
collectively identify knowledge gaps that 
future studies will fill. They should partner 
with affected people to establish emergent 
research priorities in dealing with a disaster. 
Such collaborative engagement can help to 
clarify where and when researchers will go into 
the field, what they will study, and who should 
be on the team. For example, psychologists 
and anthropologists might study and sup- 
port local coping mechanisms; historians and 
civil engineers might collaborate to examine 
and promote resilient traditional architec- 
tural features when rebuilding homes in 
cyclone-affected areas. 

The needs of local people should be central’. 
Too often, research is driven by media cover- 
age and politics. Disasters in heavily populated 
areas receive the most attention, but the cumu- 
lative impacts of smaller events can be just as 
devastating. For example, after the massive 
Nepal earthquake in April 2015, the impacts on 
infrastructure and the quality of shelters were 
widely studied, and aid donors gave millions of 
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dollars to rebuild parts of Kathmandu. Yet in 
rural western Nepal, hundreds of villages cope 
with floods and landslides each year, unnoticed 
by the outside world. 

Aresearcher code can help to redress the bal- 
ance. For example, the Philippines requires that 
post-disaster projects demonstrate how they 
will meet the priorities of affected communi- 
ties. New Zealand encourages researchers to 
defer collecting data unless the information will 
support responders. More relevant research 
could provide the evidence to inform and direct 
recovery funding to where needs really lie. 


Respect local voices. Wealthy countries 
account for most disaster scholarship and 
funding. For example, more than 90% of arti- 
cles published following Hurricane Katrina, 
which hit the southern United States in 2005, 
were by US researchers’. By contrast, fewer 
than 5% of publications onthe 2010 Haiti earth- 
quake were led by authors based inthe country 
(see ‘Unequal partners’). 

Similarly, 84% of articles published between 
1977 and 2017 in Disasters, the flagship jour- 
nal in the field, were led by authors based in 
countries of the Organisation for Economic 
Co-operation and Development (OECD). Yet 
93% of the people killed by large disasters over 
the same period lived innon-OECD countries, 
according to the EM-DAT disaster database’. 

Outside researchers — who have not had their 
lives disrupted by disaster — are positioned to 
seek funding and might overlook local work 
and partners. After Hurricane Katrina in 2005, 
local experts in urban poverty, affordable 
housing and coastal land loss were passed over 
for grants”. And local and external priorities 
might differ. In 2011, following the Joplin tor- 
nado in Missouri, outside academics assessed 
damage to infrastructure. By contrast, locally 
based researchers were eager to learn how to 
support emotional health after witnessing a rise 
in post-traumatic stress in children and adults”. 
Bothareimportant topics, but funding streams 
do not always follow local desires. 

Anunderstanding of local languages, poli- 
cies and practices is essential and canimprove 
response and speed recovery. After Katrina, 
‘culture brokers’ helped survivors to make 
sense of government documents so that they 
could access aid quickly”. Nonetheless, much 
disaster research is still framed by narrow 
world views. Concepts such as vulnerability 
and resilience do not necessarily translate 
well’®, Even where equivalent terms exist, they 
might be felt to be irrelevant, because natural 
phenomena such as cyclones and floods are 
not always seen as hazards. In some religious 
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traditions, volcanic eruptions are thought to 
reflect the emotions of deities, for instance. A 
lack of recognition of this nuance can affect the 
outcomes of risk-perception research as well as 
early-warning processes. 

More discussions between disaster research- 
ers inside and outside affected areas would 
shed light on these issues and could informa 
more holistic research agenda. The Geotech- 
nical Extreme Events Reconnaissance Associa- 
tion’s ethics protocol might serve asa starting 
point. It encourages engineers to adhere to 
“high standards of professionalism” and to 
be “respectful of local customs, traditions, 
privacy, and rights of affected individuals” 
(see go.nature.com/32kptno). Government 
agencies, companies and non-governmental 
organizations should also be involved in such 
conversations, given that they are increasingly 
engaged in post-disaster data collection”. 


Coordinate locals and outsiders. Projects that 
are uncoordinated can become irrelevant or 
redundant, and might overwhelm local people 
and responders. In 2013, survivors of Typhoon 
Yolanda (also known as Haiyan) in Taclobanin 
the Philippines were deluged with question- 
naires, when their immediate concerns were to 
secure housing, food, clothing and education. 

After Hurricane Harvey in the United States 
in 2017, officials atemergency operations cen- 
tres struggled to decipher the credentials of 
dozens of researchers who descended on 
Houston, Texas, requesting access. Emer- 
gency managers also had to spend precious 
time revising researchers’ survey questions 
to put them ina local context. 

Foreign scientists sometimes approach local 
researchers to serve as translators or assistants. 
These locals have little power to direct the 
research strategy, even though their insights 


are valuable. They might feel unable to be crit- 
ical even when they know the questions are 
wrong-headed. Even when they make substan- 
tial contributions, they might still be relegated 
to co-authorship — or no authorship — rather 
than being listed as the primary author. 
Incoherent data and findings might confuse 
authorities and delay decisions. Volcanolo- 
gists still argue about exactly when local com- 
munities should be evacuated. To help, the 
International Association of Volcanology and 
Chemistry of the Earth’s Interior has produced 


“Much disaster research 
is still framed by narrow 
world views.” 


guidelines onthe roles and responsibilities of 
local and outside scientists, local authorities 
and the media. 

Local researchers need to be identified 
quickly inacrisis. Asa start, the Social Science 
Extreme Events Research (SSEER) network 
has produced a global map of social scien- 
tists who study hazards and disasters (see 
go.nature.com/2qfwezc). Regional SSEER 
councils ensure that those researchers remain 
involved after the event. 


First steps 


Discussions regarding a shared code of 
conduct could start through collaborative 
disaster-research initiatives that are under 
way worldwide. These have established 
strong coordinating structures and forums 
for information sharing, and include those 
in Latin America, Africa, the European Union 
and the Asia-Pacific region. They could also 
build on disaster-response initiatives from 


UNEQUAL PARTNERS 


Authorship of papers on disaster research can be dominated by researchers 
outside the country affected, meaning that local expertise might be overlooked. 
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the medical sciences*"”. 


The US National Science Foundation (NSF) 
now supports several extreme-events recon- 
naissance and research networks. These 
advance coordination and set scientific agen- 
das in geotechnical and structural engineering, 
social sciences, near-shore systems, operations 
and systems engineering, and interdisciplinary 
science. The NSF-funded CONVERGE initiative 
(of which L.P. is the principal investigator) 
brings together leaders from these networks 
and major NSF facilities to support the devel- 
opment of guidance and data-sharing by haz- 
ards and disaster researchers. Other resources, 
including aset of free online training modules, 
are also available. These NSF initiatives are 
open to researchers globally, but they are led 
by researchers at US institutions. 

Most countries do not provide ethical guid- 
ance for researchers, and universities have 
widely varying standards for the protection of 
study participants. The UNDRRis a trusted con- 
vener of scientists and practitioners globally. It 
could serve asa focal point for the development 
and implementation of an ethical code of con- 
duct for researchers in disaster zones. As disas- 
ters unfold around the globe, the need for such 
acode of conduct becomes ever more urgent. 
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Readers respond 


Correspondence 


Chile: democratizing 
policies online 


Chile became a political 
hotbed ina matter of days last 
month. Amid the chaos, people 
demanded reforms to the 
country’s privatized pension 
and health-care systems, a 
new constitution, and punitive 
measures for tax dodgers and 
companies involved in price- 
fixing. But their voices need to 
be aggregated if they are not 
to be lost in the din of rallies 
or fragmented into thousands 
of tweets. 

To this end, we created 
the experimental platform 
Chilecracia, using 
crowdsourcing methods 
already validated in academia. 
Examples include MIT’s Place 
Pulse (P. Salesses et al. PLoS ONE 
8, e68400; 2013) and Moral 
Machine (E. Awad et al. Nature 
563, 59-64; 2018). Chilecracia 
pairs policy proposals and asks: 
“What would you prioritize?” 
Within 10 days, more than 
120,000 people had indicated at 
least one preference, amounting 
to more than 7 million votes. The 
data are helping us to compile 
detailed networks of policy 


preferences (see chilecracia.org). 


Chilecracia is being updated 
weekly with the help of ateam 
of policy experts. We have 
received requests from several 
countries to deploy regional, 
organizational and national 
instances of the system. Our 
findings add to the growing 
literature on such surveys 
(see go.nature.com/2qiwoja) 
and offer insight into online 
crowdsourced participation 
systems in politically active 
situations. 


César A. Hidalgo University of 
Toulouse, France. 
cesifoti@gmail.com 


C.A.H. declares competing 
financial interests; see 
go.nature.com/2nxscdc. 


Be Bp Se i 


Huge demonstrations have swept through Chile since mid-October. 


Chile: science could 
tackle social unrest 


Finding solutions to Chile’s 
current social turmoil will 
demand efforts from every 
sector, including the research 
community (see Nature 

575, 265-266; 2019). Our 
contribution will depend on 
government support for amore 
ambitious, participatory policy 
for science, technology and 
innovation. 

In the past, Chile’s science 
policy has focused on boosting 
productivity and economic 
growth. However, the problems 
highlighted by the latest 
social unrest are unlikely to be 
solved just by increasing gross 
domestic product (see also 
P. A. Besnier Nature 511, 385; 
2014). 

Creating amore 
comprehensive science and 
innovation policy will also 
require anew, improved way 
of doing politics. More players 
must be involved, including 
citizen representatives, to allow 
their perspective to optimize 
future policies for society’s 
well-being. 


Pablo Astudillo Besnier 
Autonomous University of Chile, 
Santiago, Chile. 
pablo.astudillo@uautonoma.cl 


Don’t bury 
hidden treasure 


There is much to applaud in 
the EON-ROSE (Earth-system 
Observing Network-Réseau 
d’Observation du Systéme 
Terrestre) project to understand 
Canada’s geology and to 

find geothermal energy (see 
Nature 574, 463-464; 2019). 
But as your heading ‘Hidden 
treasure’ indicates, there are 
also commercial implications 
in prospecting for mineral 
deposits. 

The boundaries between 
science, economics and politics 
are often invisible. Scientific 
ventures should always be 
open about their potentially 
commercial objectives, as inthe 
EON-ROSE case, particularly if 
they are financed by funding 
agencies and indirectly by 
taxpayers. 

In my view, scientists should 
reject outright any subsidies 
that support the search for and 
development of carbon-based 
energy and of environmentally 
destructive mineral extraction. 


Talan iscan Dalhousie University, 
Halifax, Canada. 
talan.iscan@dal.ca 
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Writer’s secret? 
No interruptions 


Your piece ‘Day in the life of a 
24-hour global news factory’ 
revived memories of my only 
visit to the Nature office (see 
go.nature.com/2k9scd1). The 
occasion was in the early 1980s, 
when John Maddox was in his 
second term as editor-in-chief 
and he invited me to report 
news stories from India for 

the journal. He wrote many 
himself, one of which included 
an interview with Indira Gandhi, 
the country’s prime minister at 
the time (see Nature 308, 582; 
1984). 

When I met the great man 
again, he was in hospital being 
treated for aleg injury. He 
passed his days there avidly 
reading, writing and editing 
as usual. I enquired after his 
secret of being able to write 
such insightful editorials ona 
host of topics — ranging from 
physics to philosophy — week 
after week. His reply was that 
he always firmly shut his office 
door, allowing no phone calls or 
other interruptions until he had 
completed his next editorial. 


Killugudi Jayaraman Bangalore, 
India. 
killugudi@hotmail.com 


HOW TO SUBMIT 


Correspondence may be 
submitted to correspondence@ 
nature.com after consulting the 
author guidelines and section 
policies at go.nature.com/ 
cmchno. 
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Expert insight into current research 


News & views 


Palaeontology 


Fossil ape hints at how 
bipedal walking evolved 


Tracy L. Kivell 


Approximately 11.6-million-year-old fossils reveal an ape 
with arms suited to hanging in trees but human-like legs, 
suggesting a form of locomotion that might push back the 
timeline for when walking on two feet evolved. See p.489 


Ever since Charles Darwin’s work provided 
the basis for understanding human evolu- 
tion, there have been long-standing questions 
regarding when, why and how our early human 
ancestors begin to walk on two feet. The 
commitment to terrestrial bipedalism, char- 
acterized by skeletal adaptations for walking 
regularly ontwo feet, is a defining feature that 
enables the assignment of fossils to the homi- 
nin lineage — which comprises all species more 
closely related to humans than to chimpanzees 
(Pan troglodytes) or bonobos (Pan paniscus), 
our two closest living relatives. On the basis 
of fossil findings, some of which are more 
controversial than others’”, the answer to 
the ‘when’ question is thought to be between 
7 million and 5 million years ago at the end of 
the Miocene epoch (which lasted from about 
23 million to 5 million years ago). 

Answering the questions of why and how 
hominin bipedalism evolved depends a lot 
on what kind of locomotion was being used 
before terrestrial bipedalism evolved. Did it 
evolve from an ancestor that lived mainly in 
trees, or were these ancestors already walking 
on all fours on the ground and subsequently 
evolved to stand up and walk on two feet? On 
page 489, BOhme et al.’ report the discovery 
of anape species called Danuvius guggenmosi 
from the middle of the Miocene. This species 
moved around ina previously unknown way, 
which the authors suggest could provide a 
model for the type of locomotion from which 
hominin bipedalism evolved. 

Questions about the origin of homi- 
nin bipedalism and how the last common 
ancestor of humans, chimpanzees and bon- 
obos might have moved are conventionally 
addressed using either a top-down or a bot- 
tom-up approach (Fig. 1). Darwin* and many 
palaeoanthropologists favoured the top- 
down approach, examining living primates, 


particularly the great apes, for clues to 
how bipedalism evolved>*. African apes — 
chimpanzees, bonobos and gorillas (of the 
genus Gorilla) — go into the trees to eat, sleep 
and when they need protection, but spend 
most of their time on the ground, using their 
knuckles for walking. Given our close genetic 
relationship to these apes, and because we also 
share certain features of our hands and feet 


Humans Bonobos Chimps 


Last common © 
ancestor of 
humans, chimps 
and bonobos 


Gorillas 


with them, some have argued that hominin 
bipedalism evolved from a knuckle-walking 
ancestor°, or a more generalized quadruped 
lacking knuckle-walking specializations’, 
that divided its time between the ground and 
the trees. By contrast, others have noted that 
the way that orangutans (of the genus Pongo) 
move bipedally in trees, and the mechanical 
similarities between how apes use their legs for 
climbing and how humans use theirs for walk- 
ing, suggest that bipedalism evolved from an 
ape ancestor that was previously committed 
to life in the trees®*. 

Although logical, this top-down approach 
is constrained, as Darwin acknowledged‘, to 
examining evidence from the few remain- 
ing living ape species. However, one of the 
earliest potential hominins for which we have 
the most fossil evidence — the approximately 
4.4-million-year-old Ardipithecus ramidus — 
is argued to be distinctly unlike living great 
apes in its anatomy, which suggests that the 
African apes and Asian orangutans we know 
today are actually quite specialized in their 
locomotor behaviours compared with their 
earlier ancestors’. Each living ape species is 
a result of its own long, evolutionary history, 


Million 


Orangutans years ago 
O 


Ardipithecus 


sa” ramidus 


10 Sivapithecus 
Danuvius 


guggenmosi 


15 -—> Nachalopithecus 


Figure 1 | The evolution of bipedalism. In the branch of the evolutionary tree that splits from our last 
common ancestor with chimpanzees (Pan troglodytes) and bonobos (Pan paniscus), humans and our extinct 
hominin relatives have a skeleton adapted for regular walking on the ground using two feet. A top-down 
approach to assessing how our early ancestors might have evolved bipedalism focuses on possible modes 

of ancestral locomotion by considering how living great apes move around. For example, African apes — 
chimps, bonobos and gorillas (of the genus Gorilla) — use knuckle-walking more frequently on the ground 
than in trees, and these apes and orangutans (of the genus Pongo) also climb and use suspensory locomotion 
in trees. However, fossils of the ancient potential hominin Ardipithecus ramidus suggest that living apes 
might have evolved quite specialized locomotion compared with their earlier ancestors. A bottom-up 
approach focuses instead on ancient ape fossils that pre-date our last common ancestor, such as those of 
the genus Nacholapithecus or Sivapithecus. However, the clues uncovered from such fossils can be difficult 
to interpret. Bohme et al.’ present fossils of a previously unknown ape called Danuvius guggenmosi, which 
the authors suggest provides a good model for the type of locomotion from which bipedalism might have 
evolved. The branch-point timings shown are approximate. 
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and, in the case of African apes, one that we 
often forget because there is so little fossil 
evidence of it. This absence of fossil informa- 
tion to reveal how African apes evolved makes 
questions about the nature of our common 
ancestor even trickier to answer. 

Other palaeoanthropologists address the 
question of bipedal origins froma bottom-up 
approach instead, looking to the approxi- 
mately 30 genera of fossil apes that have been 
identified from the Miocene of Africa, Asiaand 
Europe as potential models for what our last 
common ancestor might have looked like””’. 
However, these apes show a hotchpotch of 
skeletal adaptations, with features found in 
combinations that are unlike anything we 
see in living primates, and that often leave 
us guessing about how these animals moved 
around and how muchtime they spent intrees 
or onthe ground. 

For example, a genus of fossil ape called 
Nacholapithecus had a monkey-like body 
but unusually large forelimbs and long 
toes, whereas another ape genus, Sivapithecus, 
had an orangutan-like face, an ape-like 
shoulder, and a monkey-like elbow and 
pelvis'®”. Such characteristics suggest 
odd combinations of arboreal suspension 
(hanging from tree branches), quadrupedal 
movements and body postures that are dif- 
ficult to imagine today, and which make it 
hard to interpret these creatures’ probable 
locomotion patterns”. 

Bohme and colleagues add to this amazing 
Miocene diversity by presenting approx- 
imately 11.6-million-year-old fossils of 
D. guggenmosi. The authors interpret the 
shape of the D. guggenmosifossils as indicating 
atype of previously unknown movement that 
they term extended limb clambering, which 
combines adaptations of both suspension in 
the trees and bipedal locomotion. This makes 
ita good possible model of locomotion for the 
last common ancestor. 

The teeth of D. guggenmosi identify it as 
belonging to a group of fossil ape species 
called dryopithecins that have been found 
from the mid- to late Miocene in Europe and 
that some consider to be ancestral to African 
apes’. Living African ape species inhabit the 
equatorial region of Africa, but, during certain 
times of the Miocene, many ancestral great 
apes were living throughout Europe and Asia 
and migrating bothto and out of Africa. Some 
researchers suggest that the dryopithecins 
show features found in chimps and gorillas 
today and therefore make good candidates 
for the ancestors of living African apes’. The 
D. guggenmosi skeleton is unique compared 
with other dryopithecin specimens, both in 
its preservation of two, almost complete, 
limb bones — an ulna (a forearm bone) anda 
tibia (a leg bone) — and in the combination of 
characteristics it displays. Bohme et al. focus 
their attention onababoon-sized and probably 
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Figure 2 | Danuvius guggenmosi. Bohme and 
colleagues instructed the artist Velizar Simeonovski 
to make an illustration of what this species might 
have looked like. 


male partial skeleton. As well as the ulna and 
tibia, the skeleton includes some vertebrae, 
a partial thigh bone (femur), and hand and 
foot bones. 

The length of the ulna relative to the tibia 
shows that the forearm of D. guggenmosi was 
long relative to the leg, similar to a bonobo’s 


“Thenewly discovered 
ape species might have 
walked flat-footed on 
branches.” 


form. Combined with a flexible elbow and 
hand bones indicating a powerful, grasping 
thumb and curved fingers, the forelimb has 
the telltale signs of arboreal suspension found 
inallliving great apes. 

However, the lower limb of D. guggenmosi 
tells a different story, and one that is more 
reminiscent of human lower limbs than of 
those of other great apes. The shape of the 
joints of the femur and tibia suggests the use of 
extended (upright) hip and knee postures that 
differ from the bent hips and knees that living 
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African apes use when they occasionally walk 
bipedally on the ground or in trees. The top 
of the tibia is reinforced, and the ankle joint 
is stable, properties that are adaptations for 
resisting the higher load placed on the lower 
leg when moving on two limbs instead of 
four. But the foot has a long, robust big toe 
that would be good for grasping, suggest- 
ing that D. guggenmosi might have walked 
flat-footed on branches (Fig. 2). Whether or 
notit regularly walked bipedally onthe ground 
is less clear. 

Together, the mosaic features of 
D. guggenmosi arguably provide the best 
model yet of what a common ancestor of 
humans and African apes might have looked 
like. It offers something for everyone: the fore- 
limbs suited to life in the trees that all living 
apes, including humans, still have; lower limbs 
suited to extended postures like those used by 
orangutans during bipedalism in the trees®; 
and further specialization of such features of 
the lower limbs in humans to enable habitual 
terrestrial bipedalism. 

If it is accepted that the locomotor 
behaviours observed in living great apes and 
humans evolved from an ancestor that used 
extended limb clambering, this would answer 
the question of what kind of early locomotion 
underlies our bipedal origins. And that would 
get us closer to answering why and how our 
human ancestors became less dependent on 
life in the trees and fully embraced two-footed 
terrestrial locomotion. Until more fossil evi- 
dence of how African apes evolved is found, 
a bottom-up approach from the Miocene 
is probably our best means of deciphering 
the evolution of one of our most defining 
human features. 


Tracy L. Kivell is at the School of 
Anthropology and Conservation, University of 
Kent, Canterbury CT2 7NR, UK, and at the Max 
Planck Institute for Evolutionary Anthropology, 
Leipzig, Germany. 
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Snapshots ofa genetic 


cut-and-paste 


Orsolya Barabas 


Transposase proteins mediate the movement of ‘parasitic’ 
DNA segments in genomes. A series of structures ofa 
transposase catches it in action, and highlights how these 
proteins evolved for use in immune systems. See p.540 


Our genetic material is littered with parasitic 
DNA sequences, knownas transposons, which 
promote their own propagation and transmis- 
sion, rather than their host’s. Their move- 
ments (transposition) within and between 
genomes have profound consequences for 
the genetic code — sometimes leading to dis- 
eases, but also driving genetic diversity and 
evolution’. Transposase proteins mediate 
this movement by executing all the required 
chemical reactions. 

Strikingly, these proteins have been 
repeatedly repurposed throughout evolu- 
tion to produce new biological functions 
that benefit the host. A prime example is the 
vertebrate immune system, in which transpo- 
sase-like RAG proteins help to assemble new 
genes from three pools of interchangeable 
DNA parts (knownas V, DandJ gene segments). 
This process is called V(D)J recombination, 
and equips immune cells with a diverse set of 
sensors that can recognize many threats’. On 
page 540, Liuetal.* report aseries of structures 
of a transposase that is an ancestor® of RAG, 
casting light on the evolutionary history of 
these proteins. 

Transposases must recognize several DNA 
sites, and then cut and join them inthe proper 
order®. To understand this multi-step process, 
we need to visualize the structures of the 
molecular machinery involved at all stages, 
which is a major technical challenge. Liu etal. 
have now drawn ona powerful combination of 
two techniques — X-ray crystallography and 
single-particle cryo-electron microscopy 
— to picture several steps of transposition 
in remarkable detail, thereby providing a 
molecular ‘movie’ of the process. 

The authors’ achievements build on many 
years of structural studies of transposases’ ™ 
and RAG” ®, providing an increasingly com- 
plete view of their functions and helping to 
connect the dots between the ‘selfish’ DNA 
rearrangements of transposases and the 
essential functions that evolved from them. 
Transposases are now known to have a cata- 
lytic core unit and diverse extra parts that bind 


to DNA or control transposase function®. They 
usually act in pairs, with each dimer holding 
two segments of the transposon DNA. 

The transposase studied by Liu et al. 
mediates the movement of a transposon 
called Transib, and was isolated from a moth 
(Helicoverpa zea). Appropriately, the shape of 
the transposase complex resembles that of a 
moth: each ‘wing’ comes from one of the two 
transposase molecules inthe complex, and is 
formed mainly froma protein region called the 
zinc-binding (ZnB) domain. The wings provide 
many of the interactions with the DNA, which 
forms the ‘antennae’ (Fig. 1). 

As with many transposons, Transib moves 
by a cut-and-paste process: its transposase 
cleaves it out of the genome, using a cata- 
lytic core present in many transposases, and 
inserts it elsewhere in the genome”. Liu and 
co-workers’ structures of five steps in Transib 


Transposon 
DNA binds 


Catalytic domain 


transposition now reveal the remarkable 
conformational changes inthe protein during 
this process. 

Perhaps most notably, the authors find that 
the ZnB wings move constantly: they unfurl 
when transposon DNA first arrives, and then 
close and open again during the rest of the 
process. This ‘flapping’ accompanies some 
impressive DNA acrobatics, which brings dif- 
ferent DNA parts into the protein’s core for 
ordered cleavage and joining. Remarkably, 
the ZnB domains help to capture not only the 
transposon, but also the target molecules into 
which Transib will be inserted — first opening 
to make space for the molecules, and then 
closing to plug the region of the target DNA 
into which Transib will be integrated into the 
core. The structures also show that the end 
section of the protein (the carboxy-termi- 
nal tail; CTT) contains three short a-helices 
that form an accordion-like structure, which 
connects the moving wings to the complex’s 
‘body’. Previously reported transposase struc- 
tures®™ have revealed similar overall features, 
but the movements in the Transib transposase 
are much more extensive. 

RAG protein complexes consist of two 
transposase-like RAG1 proteins and two 
RAG2 proteins. In these complexes, ZnB is 
present in RAGI, but is more fixed than in 
the Transib transposase; and no part is pres- 
ent” that is analogous to CTT. RAG2, which is 
essential in V(D)J recombination but absent 
in most transposases, sits above the wings 
of RAG1 and holds a large part of the DNA” — 
much as ZnB and CTT do in the Transib 
transposase. 

Unlike transposases, the RAG complex cuts 


First cut 


Second cut 


Target DNA 
binds 
Je 
Transposon 
is inserted 


Figure 1 | Flapping of a transposase complex. Liu et al.‘ report a series of structures of the transposase 
enzyme that mediates the movement of the Transib transposon (a parasitic genetic element) within 
genomes; the first structure shown was obtained using X-ray crystallography, and the others were obtained 
using single-particle cryo-electron microscopy. The transposase forms a dimeric complex that is roughly 
moth-shaped. The ‘wings’ unfurl to capture transposon-containing DNA, and then close again as the 
catalytic domain makes the first cut to cleave the transposase out of the DNA. The wings open again after the 
second cut, allowing target DNA (the DNA into which the transposon will be inserted) to be captured and the 
transposon to be inserted. The DNA sequences differ in length in some of the panels. 
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and removes the distinct DNA sequences 
found between the V, D andJ gene segments, 
tightly coordinating the process to ensure 
that different types of segment are subse- 
quently connected. In cells, RAG also largely 
stops the removed DNA from being reinserted 
elsewhere in the genome, to prevent poten- 
tially harmful changes to the genetic code. 
But how did these functions evolve? A log- 
ical proposal implicates RAG2, but a recent 
study" of ProtoRAG — a relative of RAG found 
in invertebrates that contains RAG2 but still 
acts as a transposase — shows that things are 
more complicated. Elements in both RAG1and 
RAG2 help to coordinate DNA cleavage and 
prevent insertion. 

Liu and colleagues’ findings cast fresh light 
onthe role of RAG2, showing that it carries out 
many of the functions of ZnB, but increases 
the rigidity of the whole RAG complex, com- 
pared with that of the transposase complex. It 
binds the DNAat the V, D andJ segments more 
tightly than ZnB binds at the transposon, and 
does not undergo such large conformational 
changes (which can require a lot of energy, 
and thus reduce efficiency). This increased 
rigidity and tight binding might help to ensure 
the strict molecular coordination required 
for V(D)J recombination. It might also pre- 
vent release of cleaved DNA segments and/or 
stop the wings from reopening to accept any 
other DNA molecules — thereby preventing 
removed DNA from being reinserted else- 
wherein the genome. If the wings do not open, 
then any incoming DNA would have to bend 
itself to an angle of about 150° before entering 
the protein, which is not easily done. 

Note that Liu et al. were not able to directly 
observe the structure of the transposase in 
complex with intact target DNA. It therefore 
remains to be seen whether target DNA first 
binds to the transposase in a relaxed form 
and is then forced into a severe 150° bend. 
The authors also did not observe a complex 
in which the transposase binds intact trans- 
poson DNA such that the catalytic core is 
close enough to the ends to cleave them; 
instead, the authors observed intact trans- 
poson DNA bound with its ends away from 
the catalytic centre. In RAG, a large twist in 
the DNA is needed to position its breakpoints 
accurately forthe cuts". A similar twist might 
occur in Transib, but other explanations are 
also possible. 

Efforts are now needed to define the exact 
functions of RAG2. Curiously, the cell-free RAG 
complex can readily insert excised DNA into 
another DNA molecule’”’’ (a target DNA). 
Structures of RAG with a bound target 
DNA must therefore be obtained — ideally, 
both with the intact target and after inser- 
tion. These structures will show whether 
the target DNA becomes as sharply bent 
as it does in the Transib transposase, and 
reveal how RAG2 affects the binding of target 
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DNA and its insertion of excised DNA. 

Other proteins might be needed to promote 
the function of RAG. This possibility has pre- 
viously been investigated, but the availabil- 
ity of new structures and methods provides 
further opportunities for research. For exam- 
ple, large molecular assemblies can now be 
studied inside cells using a technique called 
electron tomography”, and molecular inter- 
actions can be probed with advanced mass- 
spectrometry methods”. Analysis of genomic 
data from different species will also be helpful 
inidentifying ancestors of RAG proteins other 
than ProtoRAG and the Transib transposase, 
and thereby exploring their evolutionary 
history. Such research will help to explain how 
parasitic genetic elements can be repurposed 
for crucial biological functions. 
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Extreme emission seen 
from y-ray bursts 


Bing Zhang 


Cosmic explosions called y-ray bursts are the most energetic 
bursting events in the Universe. Observations of extremely 
high-energy emission from two y-ray bursts provide a new way 
to study these gigantic explosions. See p.455, p.459 & p.464 


Astrophysical explosions known as y-ray 
bursts (GRBs) can release in one second the 
amount of energy that the Sun will produce 
in its entire lifetime’. The emission from 
GRBs covers a broad stretch of the electro- 
magnetic spectrum and occurs in two stages: 
the prompt-emission phase and the after- 
glow phase. The main emission mechanism is 
thought to be synchrotron radiation, whereby 
the gyration of energetic electrons around 
magnetic-field lines releases photons. Until 
now, emission from GRBs has been observed 
only at energies below 100 gigaelectronvolts 
(GeV). Three papers in this issue** report 
observations of y-rays that have energies 
above 100 GeV from two bright GRBs, dubbed 
GRB 190114C and GRB 180720B. 

The Major Atmospheric Gamma Imaging 
Cherenkov (MAGIC) Collaboration? (page 455) 
detected photons in the teraelectronvolt range 
(1 TeV is 10? GeV) from GRB 190114C, using the 
MAGIC telescopes at La Palma, Spain. The first 
detections started about one minute after 
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the burst triggered NASA’s two spaceborne 
GRB detectors: the Burst Alert Telescope on 
board the Swift satellite and the Gamma-ray 
Burst Monitor on board the Fermi satellite. 
The high-energy photons continued to rain 
down on the MAGIC telescopes for about 
20 minutes, with the flux decreasing rapidly 
over this time. The MAGIC Collaboration and 
colleagues? (page 459) detected this GRB using 
several other ground-based and space-borne 
telescopes. When combined with the MAGIC 
data, this rich data set allowed the authors to 
model the event comprehensively and study 
how the TeV emission was produced. 

Abdalla etal.* (page 464) detected photons 
of energies above 100 GeV (but below 1 TeV) 
from GRB 180720B, using the High Energy 
Stereoscopic System (HESS) array of tele- 
scopes in Namibia. Although these photons 
were lower in energy and fewer in number than 
those observed from GRB 190114C, they were 
detected from deep in the afterglow phase 
(10 hours after the GRB was triggered and 


lasting for 2 hours). The flux and maximum 
energy of the afterglow emission both 
decrease over time, owing to deceleration of 
the jets — the two narrow, oppositely directed 
channels through which most of the explosive 
energy of aGRBis released. Consequently, the 
detection of such high-energy photons deep 
inthe afterglow phase is also groundbreaking. 

The MAGIC and HESS observatories both 
use an array of optical telescopes called 
imaging atmospheric Cherenkov telescopes 
(IACTs), which are designed to detect y-rays 
in the very-high-energy range (roughly from 
30 GeV to 100 TeV). More precisely, the IACTs 
detect the light (known as Cherenkov radia- 
tion) that is produced when such y-rays hit 
Earth’s atmosphere and produce a shower of 
charged particles. These facilities have been 
operating for more than a decade. GRBs, as 
the most powerful explosions in the Universe, 
have been one of the main observational 
targets, but, until now, have evaded detection. 
The current results are therefore a triumph for 
these observatories. 

The discoveries are also a triumph for GRB 
theories. Theoretically, there are three mech- 
anisms by which high-energy y-rays can be 
produced during the afterglow phase’. The 
first is synchrotron radiation from electrons 
accelerated by the external shock — the shock 
wave that is generated when the exploded 
matter collides with surrounding interstellar 
gas. This emission component has a maximum 
energy that depends only onthe Lorentz factor 
of the outflow (a parameter that denotes how 
fast the external shock is moving). To reach 
energies above 100 GeV, the Lorentz factor 
must be greater® than about 1,000, which is 
only marginally possible. Observations show 
that the Lorentz factor of GRB jets is usually 
a few hundred during the prompt-emission 
phase and decreases over time during the 
afterglow phase’. 

The second high-energy radiation mecha- 
nism is synchrotron radiation from protons 
accelerated by the GRB external shock. This 
emission component can, in principle, con- 
tain TeV y-rays. However, because protons are 
much less efficient emitters than are electrons, 
the conditions for this mechanism to be dom- 
inant are rather demanding. Finally, the third 
mechanism is called synchrotron self-Comp- 
ton’ (SSC), whereby the same accelerated 
electrons that emit synchrotron photons can 
scatter off some of these photons, resulting 
in photons that have energies above 100 GeV 
(Fig. 1a). For typical shock-microphysics 
parameters inferred from afterglow model- 
ling of other GRBs, it is expected that the SSC 
mechanism should usually be the main way in 
which high-energy y-rays are produced’. 

One key prediction of the SSC mechanism 
is that there should be two ‘humps’ in the 
spectral energy distribution of the after- 
glow spectrum?’ (Fig. 1b). Such a two-hump 
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Figure 1| Emission froma y-ray burst. a, Three papers” * report the detection of high-energy radiation from 
astrophysical explosions known as y-ray bursts (GRBs). The explosive energy from a GRB is thought to be 
channelled into two narrow jets. Photon emission occurs in two stages: the prompt-emission phase and the 
afterglow phase. In the afterglow phase, low-energy photons are thought to be generated by amechanism 
called synchrotron radiation. High-energy photons are thought to be mainly produced through a process 
dubbed synchrotron self-Compton (SSC), whereby the scattering of synchrotron photons off energetic 
electrons gives the photons a boost in energy*”. b, One key prediction of the SSC mechanism is that there 
should be two ‘humps in the spectral energy distribution of the afterglow spectrum: one corresponding 

to synchrotron photons and the other to SSC photons”. Results from the three papers firmly establish the 


existence of such an SSC component. 


structure has been commonly observed for 
high-energy jets launched from supermassive 
black holes known as blazars®, and the same 
structure has been confidently expected for 
GRBs. Previous observations of high-energy 
afterglows of GRBs using the Large Area Tele- 
scope on board the Fermi satellite have not 
convincingly shown the existence of asecond 
hump in the spectral energy distributions’. 
However, some tentative evidence has been 
collected from another bright burst’®”, 
GRB 130427A. 

The multi-wavelength observations 
of GRB 190114C obtained by the MAGIC 
Collaboration and colleagues have firmly 
established, for the first time, the existence 
of the SSC component inaGRBafterglow’. This 
conclusion has been confirmed by independ- 
ent modelling from other groups” ©. The dou- 
ble-hump feature is comparatively less clear in 
the spectral energy distribution obtained by 
Abdalla et al. for GRB 180720B. However, in 
the late afterglow phase, electron synchrotron 
radiation cannot produce photons of energies 
above 100 GeV without the need to introduce 
exotic particle-acceleration mechanisms. As 
a result, the SSC mechanism is the preferred 
explanation for the observed spectral energy 
distribution*”. 

Why did it take so long to detect a theoreti- 
cally expected common spectral component? 
The observation of aGRB by an IACT requires 
that the burst is bright (to produce a suffi- 
cient number of high-energy photons) and 
nearby (to avoid absorption of the photons 
by the infrared background radiation in the 
Universe). Furthermore, the telescope needs 
to have the correct observational conditions. 
For instance, a particular GRB would not be 
detected by an IACT ifthe event occurred dur- 
ing the daytime, in poor weather or in an area 
of the sky that was not accessible by the tele- 
scope. Nevertheless, the breakthrough results 
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reported in the current papers suggest that, 
with dedication and probably a bit of luck, a 
revolutionary discovery can be made. 

Now that photons of energies above 100 GeV 
have been detected from GRBs, it is expected 
that such detections will become routine inthe 
future — especially with the full operations of 
the available IACTs and of observatories that 
use other detection techniques, such as the 
High-Altitude Water Cherenkov Observatory 
in Mexico. The field will also greatly benefit 
from the operations of facilities such as the 
future international Cherenkov Telescope 
Array and the Large High Altitude Air Shower 
Observatory in Daocheng, China. As history 
has repeatedly shown, the opening of anew 
spectral window in GRB research always 
reveals many treasures for researchers to 
mine. This spectral window at the highest 
energies will not be any different, and could 
be even more rewarding. 
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Immunotherapy 


Three is acharm for an 
antibody to fight cancer 


Alfred L. Garfall & Carl H. June 


Immunotherapy approaches seek to boost immune responses 
against cancer. A single antibody engineered to recognize 
three targets shows promise, when tested in animals, in 
improving the ability of T cells to target cancer. 


Antibodies with specificity for one target — 
called monoclonal antibodies — were the 
first cancer immunotherapy to achieve wide- 
spread clinical use. The therapeutic potency 
of antibodies can be amplified by engineer- 
ing them to recognize two distinct molecular 
targets (termed antigens). These bispecific 
antibodies can simultaneously bind to can- 
cer cells and immune cells called T cells, and 
this dual binding directs the T cell to unleash 
its cell-killing power towards the cancer cell. 
Writing in Nature Cancer, Wuet al.'now report 
the development of atrispecific antibody, one 
that has three targets: acancer cell, areceptor 
that activates T cells, anda T-cell protein that 
promotes long-lasting T-cell activity against 
the cancer cell (Fig. 1). 

The mammalian immune system generates 
an immense diversity of antibodies, and anti- 
bodies can also be engineered to recognize 
therapeutic targets. Antibodies usually rec- 
ognize a single antigen, which might be part 
of adisease-causing agent or an abnormal ver- 
sion of a protein or sugar. Such monospecific 
antibodies against targets on cancer cells can 
recruit immune cells, including neutrophils, 
natural killer cells and macrophages, to kill 
or ingest the cancer cells. 

Antibodies can also be engineered to block 
or stimulate the function of the proteins to 
which they bind. For example, there are reg- 
ulatory receptors that inhibit T-cell function, 
and antibodies that have been engineered 
to block these receptors provide a clinical 
strategy knownas checkpoint blockade, which 
boosts T-cell function. These inhibitory recep- 
tors govern T-cell exhaustion, anon-functional 
T-cell state that protects against autoimmunity 
and that can occur inthe tumour microenviron- 
ment as cancers evade antitumour responses 
mediated by T cells. Checkpoint-blockade 
treatment can awaken exhausted antitumour 
T cells to great clinical benefit, but it also risks 
causing autoimmune toxicity. The antibody 
developed by Wu and colleagues takes a simi- 
lar approachto promote T-cell activity against 
cancer cells. However, their method stimulates 
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the function of receptors that positively boost 
T-cell function, rather than blocking the 
function of inhibitory receptors. 

The human antibody developed by Wuetal. 
builds on bispecific-antibody technology that 
reconfigures the antigen-recognition domains 
of two different antibodies into one bispecific 
molecule. Bispecific antibodies usually target 
one antigen on the cancer cell’s surface and 
one onaprotein complex onT cells called CD3. 
CD3 is part of the T-cell receptor (TCR) com- 
plex. The TCR also includes antigen-recogni- 
tion domains and delivers an activating signal 
to the T cell when an antigen binds. Engage- 
ment of CD3 by the antibody also generates 
an activating signal. Sucha bispecific antibody 
therefore activates T cells, brings them into 
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close proximity to cancer cells — irrespective 
of the T cell’s natural antigen specificity — and 
redirects their killing capabilities towards the 
cancer cells. 

This concept has proved to be clinically effec- 
tive for the bispecific antibody blinatumomab, 
which targets CD3 and the protein CD19 on 
cancer cells. Blinatumomab treatment doubles 
the remission rate and survival among people 
with an advanced stage ofa cancer called B-cell 
acute lymphoblastic leukaemia (B-ALL)’, and it 
is being tested as part of the initial therapy for 
B-ALL, with promising early results. 

Wu and colleagues devised aclever strategy 
to simultaneously boost T-cell activation and 
enhance the targeting of cancer cells in rela- 
tion to multiple myeloma, which is a cancer 
of plasma cells in the blood. The authors 
developed a trispecific antibody that was 
engineered to have three antigen-binding 
sites, rather than two. This trispecific anti- 
body targets CD3 plus the proteins CD38 
(on cancer cells) and CD28 (on T cells). The 
CD38-targeting antibody daratumumab is 
clinically effective in treating this disease*, and 
CD38 is also a potential target in other cancers, 
such as acute lymphoid leukaemia and acute 
myeloid leukaemia. 

CD28 belongs to a class of protein called 
co-stimulatory receptors, which positively 
regulate T-cell activation. When aT cell rec- 
ognizes its target antigen through the TCR, 
the extra engagement of a co-stimulatory 
receptor such as CD28 is needed to achieve 
the sustained T-cell proliferation required for 
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Figure 1| An antibody that helps immune cells to target cancer cells. Wu et al.' report the development of 
ahuman antibody that is engineered to bring an immune cell called a T cell into close proximity with a type 

of cancer cell called amyeloma cell and to boost the T cell’s anticancer response. This trispecific antibody binds 
three targets: the protein CD38 ona myeloma cell, and the protein CD28 and the protein complex CD3 ona 

T cell (the antibody’s target-binding domains are shown in red, blue and yellow, respectively). CD3 is part of 
the T-cell receptor (TCR), which recognizes abnormal cells by binding molecules called antigens. The binding 
of CD3 by the antibody drives T-cell activation (without requiring antigen recognition by the TCR), which leads 
to the killing of the myeloma cell and the production and release of toxic cytokine molecules. Binding of CD28 
by the antibody drives expression of the protein Bcl-xL. Bcl-xL blocks T-cell death, which might otherwise 
occur if there was prolonged TCR activation in the absence of CD28 stimulation by the antibody. 
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an effective immune response. In the absence 
of co-stimulation, activation through the TCR 
can lead to a state of T-cell non-responsive- 
ness called anergy, or to the related state of 
exhaustion. Prolonged activation of the TCR 
without co-stimulation can lead the T cell to 
undergo a form of programmed cell death 
called apoptosis. 

The addition of a co-stimulatory signal 
such as CD28 is notable because this signal 
has also been incorporated into another type 
of immunotherapy called chimaeric-antigen 
receptor T cell (CAR-T) therapy’, in whicha 
receptor is engineered to both recognize a 
cancer-cell antigen and include T-cell acti- 
vation domains such as CD3 and CD28. The 
main reason for including a CD28-binding 
domain in the trispecific antibody is T-cell 
co-stimulation. However, CD28 is also fre- 
quently expressed by multiple myeloma cells, 
so this might increase the antibody’s affinity 
for the myeloma cells, and thus enable it to 
bind to cells in which CD38 is low, absent or 
masked by previous daratumumab therapy. 

To confirm that the CD28-binding domain 
augmented the trispecific antibody’s activity, 
the authors made versions of the antibody in 
which different combinations of the three 
binding domains were mutated. They tested 
these versions in ‘humanized’ model mice, 
which had human T cells and human myeloma 
cells. A functional CD28-targeting domain 
boosted T-cell activation above that observed 
using antibodies lacking this domain. This 
augmented T-cell activation drove T-cell 
proliferation and the expression of the 
anti-apoptotic protein Bcl-xL in T cells, sup- 
porting the authors’ hypothesis that having 
aco-stimulatory signal would prevent T-cell 
apoptosis. The presence of the CD28-targeting 
domain on the antibody boosted the ability 
of T cells to kill different myeloma cell lines 
in vitro and in the humanized mouse model, 
even at the lowest antibody dose tested. 

The main limitation of this study is that the 
risk of aside effect called cytokine release syn- 
drome (CRS), which can occur if the immune 
system is highly stimulated, is unknown. In 
CRS, the simultaneous activation of many 
T cells causes excessive release of signalling 
molecules called cytokines from cells of the 
immune system, which drives inflammation. 
CRS can occur with bispecific antibodies and 
with CAR-T. It typically manifests as fever, 
but can progress to fatal multi-organ failure 
in severe cases°. 

The authors report cytokine-related 
toxicities with their trispecific antibody when 
administered to monkeys by intravenous injec- 
tion, but toxicity was less if it was delivered 
under the skin (subcutaneously) instead, 
leading to a more gradual exposure to the 
antibody. It is reassuring that the inclusion 
of the CD28-targeting domain did not lead to 
overwhelming CRS in these tests. However, 


a key caveat is that the amount of CD38 in 
monkeys is much less than in people with 
multiple myeloma, and the higher amount of 
CD38, and thus of antibody-mediated T-cell 
activation, would probably increase the risk 
of CRS in humans. But in terms of possible 
negative effects of the antibody on healthy 
non-cancerous cells, it is reassuring that only 
transient decreases in the number of normal 
white blood cells that express CD38, such as 
lymphocytes and myeloid cells, were observed 


“Targeting cancer using 
atrispecific antibody is 
animportant conceptual 
advance.” 


in monkeys treated with the antibody. Another 
limitation of the study is that the authors did 
not assess whether this trispecific antibody 
format might trigger an immune response 
against the antibody and cause its rapid 
destruction. 

Targeting cancer using atrispecific antibody 
is animportant conceptual advance, building 
on previous work by this group’ on a trispe- 
cific antibody that targets HIV. For multiple 
myeloma, fresh therapeutic approaches are 
needed, because even the most potent emerg- 
ing therapies, including a CAR-T that targets 


Microbiology 


an antigen called BCMA, are only temporarily 
effective for most people*”®. A trispecific 
antibody is a flexible platform that might 
offer a way to deliver precise combinations 
of immunomodulatory signals (for example, 
a co-stimulatory signal and a checkpoint 
blocker) specifically in the tumour micro- 
environment, which might be safer and more 
effective than the systemic administration of 
combinations of individual, single-specificity 
immunomodulatory antibodies. Such efforts 
to make immunotherapy more precise and 
potent than itis at present might be necessary 
to broaden the reach of immunotherapy to 
include the many types of cancer that have so 
far proved difficult to target. 
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Microbial clues 


to aliver disease 


Martha R. J. Clokie 


Treatment options are limited for alcoholic hepatitis, a liver 
disease associated with high alcohol intake. Studies in mice 
reveal that the microorganisms responsible for this condition 
can be tackled by a viral treatment. See p.505 


In 1984, the microbiologist Barry Marshall 
notoriously used himself as an experimen- 
tal subject for his research, and drank the 
contents of a flask containing the bacterium 
Helicobacter pylori as part of his efforts to 
demonstrate that bacteria cause stomach 
ulcers. On page 505, Duan et al. do not report 
taking such drastic action to investigate 
a bacterial connection to disease. Never- 
theless, their careful analysis of a liver dis- 
ease called alcoholic hepatitis, in studies of 
mice and analysis of samples from people 
who have the disease, also provide atten- 
tion-grabbing evidence for the involvement 
of asuspected bacterial culprit. 
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Alcoholic hepatitis is a poorly understood 
condition related to high alcohol intake, and is 
difficult to treat. Previous experiments in mice 
have hinted that the gut-dwelling bacterium 
Enterococcus faecalis might be involved?. How- 
ever, F. faecalis is usually thought of as an old 
friend that inhabits the guts of many animals 
across the evolutionary tree, from humans 
to nematode worms’. This species usually 
represents less than 0.1% of all the bacteria 
in faecal samples from healthy people*. How- 
ever, after antibiotic treatment, bacteria of 
the genus Enterococcus increase in prevalence 
to become one of the most common types of 
microbe in the gut®. F. faecalis can infect the 
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Microbial clues 


to aliver disease 


Martha R. J. Clokie 


Treatment options are limited for alcoholic hepatitis, a liver 
disease associated with high alcohol intake. Studies in mice 
reveal that the microorganisms responsible for this condition 


can be tackled by a viral treatment. 


In 1984, the microbiologist Barry Marshall 
notoriously used himself as an experimen- 
tal subject for his research, and drank the 
contents of a flask containing the bacterium 
Helicobacter pylori as part of his efforts to 
demonstrate that bacteria cause stomach 
ulcers’. Writing in Nature, Duan et al.” do not 
report taking such drastic action to investigate 
a bacterial connection to disease. Neverthe- 
less, their careful analysis of a liver disease 
called alcoholic hepatitis, in studies of mice 
and analysis of samples from people who have 
the disease, also provide attention-grabbing 
evidence for the involvement of a suspected 
bacterial culprit. 

Alcoholic hepatitis is a poorly understood 
condition related to high alcohol intake, and is 
difficult to treat. Previous experiments in mice 
have hinted that the gut-dwelling bacterium 
Enterococcus faecalis might be involved’. How- 
ever, F. faecalis is usually thought of as an old 
friend that inhabits the guts of many animals 
across the evolutionary tree, from humans 
to nematode worms’. This species usually 
represents less than 0.1% of all the bacteria 
in faecal samples from healthy people®. How- 
ever, after antibiotic treatment, bacteria of 
the genus Enterococcus increase in prevalence 
to become one of the most common types of 
microbe in the gut’. £. faecalis can infect the 
blood, heart, bladder and brain, and teeth that 
have undergone root-canal surgery”®. 

Duan and colleagues analysed human faecal 
samples. They identified F. faecalis in the stools 
of about 80% of people with alcoholic hepatitis 
that they tested, and about 30% of the strains 
of E. faecalis present had genes that encodea 
toxin called cytolysin. Furthermore, people 
with the disease had almost 3,000 times more 
E. faecalis in their stool samples than did peo- 
ple who did not have alcoholic hepatitis. That 


isn’t concrete proof that the disease is caused 
by this bacterium. However, the authors’ 
data also show that the presence of cytolysin 
in stools correlates with mortality — 89% of 
the people whose faecal samples contained 
cytolysin died within 180 days of hospitaliza- 
tion, compared with only 3.8% of the people 
who had alcoholic hepatitis but whose stool 
samples lacked the toxin. 

The authors next examined the connec- 
tion between E£. faecalis and liver disease 


a 
Liver 
Damaged 
liver cell Gut cell 
: Increased 
Gut cavity A gut permeability 
@ @— Alcohol 
es 
@ ( A on \— Enterococcus 
/ faecalis 


= —e 


Cytolysin 


in mice. The animals were colonized with 
strains of F. faecalis that either did or didn’t 
make cytolysin, and some were then fed a 
high-alcohol diet, with others given an alco- 
hol-free diet. Only the mice on the high-alcohol 
diet and that had been colonized with cyto- 
lysin-producing FE. faecalis developed liver 
damage (Fig. 1a). 

Then, using germ-free mice (which had 
no natural microorganisms), the authors 
transplanted stool samples from people with 
alcoholic hepatitis that contained E. faecalis 
strains in which cytolysin was either present 
or absent. Mice on a high-alcohol diet that 
were colonized with stools containing cyto- 
lysin displayed a range of signs indicating liver 
damage and the death of liver cells, whereas 
animals on such a diet and colonized with 
stools lacking cytolysin showed no major signs 
of liver damage. 

To understand the disease-causing 
mechanisms, the authors isolated liver 
cells from the animals, and found that cell 
death in response to cytolysin exposure was 
dose-dependent. The response to cytolysin 
was the same whether or not the mice had 
received a high-alcohol diet. This suggests 
that, rather than alcohol causing alcoholic 
hepatitis by damaging the liver cells, 
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Figure 1| Alcoholic hepatitis. Duan et al.” report studies in mice of aliver disease called alcoholic hepatitis, 
and analysis of faecal samples from people who have the disease. a, The authors report that alcoholic 
hepatitis is associated with the presence of a strain of the gut-dwelling bacterium Enterococcus faecalis 

that makes a toxin called cytolysin. These bacteria damage or kill liver cells, and the authors suggest 

that a high-alcohol diet increases gut permeability, thereby enabling the bacteria to move from the gut 

to the liver. b, To investigate possible new treatments for the disease, the authors explored the use of 
bacterium-targeting viruses called phages that specifically act on cytolysin-producing EF. faecalis. When 
treated with these phages, F. faecalis-infected mice given a high-alcohol diet did not develop liver disease. 
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damage arises because alcohol increases the 
permeability of the gut lining to allow cyto- 
lysin-producing E. faecalis to reach the liver 
and cause disease symptoms (Fig. 1a). 

Given the limited treatment options for 
alcoholic hepatitis, the authors investigated 
whether steps might be taken to develop a 
therapy that exploits bacterium-targeting 
viruses called bacteriophages, or phages for 
short. Phages have the advantage over anti- 
biotics of being highly specific, and so avoid 
also killing beneficial bacteria. Furthermore, 
because the surface of a human cell differs sub- 
stantially from that of a bacterial cell, phages 
aren’t thought to infect animal or human cells’. 

Phages have been used to remove Salmo- 
nella and Shigella bacteria from infected 
human intestines for almost 100 years’®. 
They have also been used to remove the dis- 
ease-causing bacterium Clostridium difficile 
from artificial intestines, and from hamsters 
infected with this bacterium”. It has been 
suggested that they might one day be used in 
humans or animals to remodel the composi- 
tion of the community of gut microorganisms 
(the microbiota), to produce a healthier micro- 
biota consisting of more bacteria associated 
with good health and fewer associated with 
disease”. The potential of £. faecalis-targeting 
phages to tackle human diseases is already 
being discussed’, and phages can kill anti- 
biotic-resistant strains of FE. faecalis associated 
with human bone and wound infections** and 
dental cavities’®. Furthermore, phages are 
being developed for use in the food industry 
to remove F. faecalis from cheese cultures 
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to prevent the production of toxic waste 
products”. 

To test whether a method could be 
developed to specifically remove cytolysin- 
producing E. faecalis from mice, the authors 
identified some phages that target these 
bacteria (Fig.1b) but leave other gut bacteria 
unaffected. Mice that received human stool 
samples and a high-alcohol diet and that 
were given E. faecalis-targeting phages had 
less liver damage than did mice given phages 
that killed a different bacterium not usually 
found in animals. 

This study demonstrates the advantages of 
using phages in detective work to investigate 
the contributions of microbes to disease. The 
authors show that phages can be used toiden- 
tify disease-causing bacterial components, 
and also raise the possibility that phages 
might offer potential treatment options. Fur- 
ther tests, including clinical trials, would be 
required to assess whether a phage approach 
would be useful ina human context. For exam- 
ple, phage treatment might help to target 
E. faecalis in the gut before a person receives 
aliver transplant. 

In Duan and colleagues’ study, the phages 
could treat a disease in which a causal compo- 
nentis a bacterium that normally resides inthe 
gut, even though the disease site is elsewhere 
in the body. Although much phage research 
focuses on the use of these viruses to treat 
diseases associated with antibiotic-resistant 
bacteria, the work by Duan etal. raises the pos- 
sibility of amuch wider clinical role for them. 
There is growing evidence that gut microbes 
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can affect the function of certain cells in the 
brain, and studies are ongoing to determine 
whether such microbes have a role in human 
brain diseases (see go.nature.com/2cplkfk). 
Perhaps phages could become part of the next 
generation of targeted antimicrobial therapies 
for diseases that are currently difficult to treat. 
Indeed, there might be many diseases that we 
currently don’t realize have a microbial com- 
ponent, and which could be tackled by phages. 
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generation of targeted antimicrobial therapies 
for diseases that are currently difficult to treat. 
Indeed, there might be many diseases that we 
currently don’t realize have a microbial com- 
ponent, and which could be tackled by phages. 
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Virtual and augmented 
reality enhanced by touch 


Xiao-ming Tao 


Conventional technologies for virtual and augmented reality 
simulate interactive experiences through visual and auditory 
stimuli. A technology that adds sensations of touch could find 
uses in areas from gaming to prosthetic feedback. See p.473 


Human sensation includes the commonly 
known senses and less-recognized ones suchas 
thirst, hunger and balance. Stimuli detected by 
sensory receptors are encoded into electrical 
signals that move along neural pathways to spe- 
cific parts of the brain to be decoded into useful 
information. The whole process is complex. 
For instance, the sense of touch isa collection 
of several sensations, encompassing pressure, 
pain and temperature, and touch receptors are 
stimulated by a combination of mechanical, 
chemical and thermal energy. Until now, it has 
been a great challenge to incorporate sensa- 
tions of touch into virtual and augmented 
reality. But on page 473, Yuetal. report askin- 
integrated technology that applies pressure, 
vibration or motion tothe user, enabling com- 
munication between the user and a machine 
for virtual and augmented reality (X. Yu etal. 
Nature 575, 473-479; 2019). 

The authors’ technology consists of a soft, 
lightweight sheet of electronics that adheres 
to skin, and conforms to the body’s shape, 
in a convenient, non-invasive and reversible 
manner (Fig. 1). The sheet contains arrays 
of vibratory actuators — mechanical com- 
ponents that convert electrical energy into 
vibrations. Each actuator comprises two 
connected parts: a coil of copper wire sealed 
in an acrylic base, and a permanent magnet 


mounted ona polymer beam. When anelectric 
current passes through the coil, the magnet 
vibrates at the same frequency as that of 
the current. 

Each actuator has a mass of only 1.4 grams 
andis millimetre-sized (12-18 mmin diameter 


and 2.5 mm thick). Given that human skin can 
detect submillimetre-scale touch patterns, 
one might question whether the actuators 
can be scaled down. The authors proposed 
a method to achieve such miniaturization 
and tested it by running simulations. They 
found that the diameter and thickness of each 
actuator could in future be reduced by a factor 
of ten and three, respectively. 

Akey feature of Yu and colleagues’ device is 
that its actuator arrays are wirelessly powered 
and controlled. It is therefore less cumbersome 
than wearable platforms that require connect- 
ing wires or internal batteries. The system uses 
a primary antenna for power transmission, a 
few other antennas for controlling and driving 
the actuators, and anintermediate antenna to 
boost the power harvested from the primary 
antenna. Yu et al. found that the inclusion of 
the intermediate antenna increased the col- 
lected power by a factor of about three. The 
authors carried out simulations to confirm 
that their device complies with guidelines 
from the US Federal Communications Com- 
mission and the Federal Drug Administration 
regarding safe levels of radiation exposure and 
tissue absorption. 

The distance between the power source 
and the platform needs to be less than about 
one metre, making the technology suitable for 
certain applications in virtual and augmented 
reality. Yu etal. describe three particular exam- 
ples. In the first one, a girl touches a screen 
that displays a video feed of her grandmother; 
the grandmother senses the touch through 
devices on her hand and arm. In the second 
example, aman who has a lower-arm ampu- 
tation grasps an object using a prosthetic 
arm that has a robotic hand; a device on his 
upper arm generates a pattern of sensation 
that reproduces the object’s shape. Inthe third 


Actuator 


: ea 


Touch-based 
device 


Figure 1 | Sense of virtual touch. Yu et al. present a device for incorporating touch-based sensations in 
virtual and augmented reality. The device consists of a lightweight sheet of electronics that softly laminates 
onto the skin. In this simple example, a touch screen displays a video feed of a person wearing the device, and 
asecond person touches the image of the device on the screen. Mechanical components called vibrational 
actuators apply vibrations to the skin of the person wearing the device, providing a sense of virtual touch. The 
colours of the actuators represent their degree of activation from low (yellow) to high (red). 
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example, a person playing a combat-based 
video game wears several devices across their 
body; the devices are activated when a strike 
occurs on the corresponding body part of the 
game character. 

The technology does have some drawbacks. 
For instance, each actuator is driven by a set 
current of about 5 milliamps, which is rela- 
tively high compared with that found in other 
consumer digital electronics. In addition, the 
energy lost from the other components as heat 
might affect actuator performance and cause 
warming of the skin if dissipation of the heat 
is not well managed. Moreover, although an 
optimized actuator requires only 1.75 milli- 
watts of power, the overall power consumption 
of the technology is still a key limiting factor 
in operating the platform sustainably and 


wirelessly for practical use. Miniaturization 
of the actuators could be a feasible way to 
address these issues, as the authors point out. 

In 2002, many people were inspired by 


“The device is less 
cumbersome than wearable 
technologies that require 
connecting wires or internal 
batteries.” 


a smart wearable invention known as the 
Hug Shirt, which allows hugs to be sent over 
a distance with the same ease as sending 
a text message or chatting (see go.nature. 
com/32kgloz). This technology is equipped 


with embedded sensors that detect and 
encode the strength, duration and location 
of the touch, together with the skin warmth 
and heart rate of the sender. Through wireless 
communication and a control circuit, these 
signals are decoded to control actuators that 
reproduce the sensation of the hug for the 
receiver. Both the Hug Shirt and Yu and col- 
leagues’ device suggest that the application 
of touch sensations in virtual and augmented 
reality is just beginning, and that more exciting 
progress can be expected in the future. 
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Long-duration y-ray bursts (GRBs) are the most luminous sources of electromagnetic 
radiation known in the Universe. They arise from outflows of plasma with velocities 


near the speed of light that are ejected by newly formed neutron stars or black holes 
(of stellar mass) at cosmological distances'”. Prompt flashes of megaelectronvolt- 
energy y-rays are followed by a longer-lasting afterglow emission in a wide range of 
energies (from radio waves to gigaelectronvolt y-rays), which originates from 
synchrotron radiation generated by energetic electrons in the accompanying shock 
waves**, Although emission of y-rays at even higher (teraelectronvolt) energies by 
other radiation mechanisms has been theoretically predicted® , it has not been 
previously detected’*. Here we report observations of teraelectronvolt emission from 
the y-ray burst GRB 190114C. y-rays were observed in the energy range 0.2-1 
teraelectronvolt from about one minute after the burst (at more than 50 standard 
deviations in the first 20 minutes), revealing a distinct emission component of the 
afterglow with power comparable to that of the synchrotron component. The 
observed similarity in the radiated power and temporal behaviour of the 
teraelectronvolt and X-ray bands points to processes such as inverse Compton 
upscattering as the mechanism of the teraelectronvolt emission’ “. By contrast, 
processes suchas synchrotron emission by ultrahigh-energy protons’°”’ are not 
favoured because of their low radiative efficiency. These results are anticipated to bea 
step towards a deeper understanding of the physics of GRBs and relativistic shock 


waves. 


GRB 190114C was first identified as a long-duration GRB by the Burst 
Alert Telescope (BAT) onboard the Neil Gehrels Swift Observatory 
(Swift) and the Gamma-ray Burst Monitor (GBM) instrument onboard 
the Fermi satellite on 14 January 2019, 20:57:03 universal time (UT) 
(hereafter 7,). Its duration in terms of 7,, (the time interval contain- 
ing 90% of the total photon counts) was measured to be about 116s by 
Fermi-GBM* and about 362s by Swift-BAT"®. Soon afterwards, reports 
followed on the detection of its afterglow emission at various wave- 
bands from 1.3 GHz to 23 GeV (ref. ’) and the measurement of its red- 
shift’®’, z= 0.4245 + 0.0005 (corresponding to cosmic distance). The 
isotropic-equivalent energy of the emission at energy of ¢=1-10* keV 
during 7,, observed by Fermi-GBM was F,,, ~3 x 10% erg (lerg=10°J), 
implying that GRB190114C was fairly energetic, but not exceptionally 
so compared to previous events (Methods). 

Triggered by the Swift-BAT alert, the Major Atmospheric Gamma 
Imaging Cherenkov (MAGIC) telescopes?” observed GRB190114C from 
Ty + 57s until 7) + 15,912 s (Extended Data Fig. 1). y-rays with energies 
above 0.2 TeV were detected with high significance from the begin- 
ning of the observations”””’; in the first 20 minutes of the data, the 
significance of the total y-ray signal is more than SO standard deviations 
(Methods, Extended Data Fig. 2). 

For cosmologically distant objects such as GRBs, the observed y-ray 
spectra can be substantially modified owing to attenuation by the 


extragalactic background light (EBL)**. The EBL is the diffuse back- 
ground of infrared, optical and ultraviolet radiation that permeates 
intergalactic space, constituting the emission from all galaxies in the 
Universe. y—-rays can be effectively absorbed during their propaga- 
tion via photon-photon pair-production interactions with low-energy 
photons of the EBL; this absorption is more severe for higher pho- 
ton energies and higher redshifts. The y-ray spectrum that would be 
observed if the EBL was absent, referred to as the intrinsic spectrum, 
can be inferred from the observed spectrum by ‘correcting’ for EBL 
attenuation, assuming a plausible model of the EBL”. 

Emission from GRBs occurs in two stages, which can partially overlap 
in time. The ‘prompt’ emission phase is characterized by a brief but 
intense flash of y-rays, primarily at megaelectronvolt energies. It exhib- 
its irregular variability on timescales shorter than milliseconds and 
lasts up to hundreds of seconds for long-duration GRBs. These y-rays 
are generated in the inner regions of collimated jets of plasma, which 
are ejected with ultrarelativistic velocities from highly magnetized 
neutron stars or black holes that form following the death of massive 
stars”. The ensuing ‘afterglow’ phase is characterized by emission that 
spans a broader wavelength range and decays gradually over much 
longer timescales compared to the prompt emission. This originates 
from shock waves caused by the interaction of the jet with the ambient 
gas (‘external shocks’). Its evolution is typified by a power-law decay 
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Fig.1|Light curves inthe kiloelectronvolt, gigaelectronvolt and 
teraelectronvolt bands, and spectral evolution in the teraelectronvolt band 
for GRB190114C. a, Light curves in units of energy flux (left axis) and apparent 
luminosity (right axis), for MAGIC at 0.3-1 TeV (red symbols), the Fermi Large 
Area Telescope (LAT) at 0.1-10 GeV (purple band) and the Swift X-ray Telescope 
(XRT) at 1-10 keV (green band). For the MAGIC data, the intrinsic flux is shown, 
corrected for EBL attenuation” from the observed flux. b, Temporal evolution 
of the power-law photon index, determined from time-resolved intrinsic 
spectra. The horizontal dashed line indicates the value -2. The errors shownin 
both panels are statistical only (one standard deviation). 


in time owing to the self-similar properties of the decelerating shock 
wave**. The afterglow emission of previously observed GRBs, from 
radio frequencies to gigaelectronvolt energies, is generally interpreted 
as synchrotron radiation from energetic electrons that are accelerated 
within magnetized plasma at the external shock’. Clues to whether 
the newly observed teraelectronvolt emission is associated with the 
prompt or the afterglow phase are offered by the observed light curve 
(flux F(0) as a function of time £). 

Figure 1shows such alight curve for the EBL-corrected intrinsic fluxin 
the energy range ¢=0.3-1 TeV (see also Extended Data Table 1). It is well 
fitted with a simple power-law function F(t) « & with B = -1.60 + 0.07. 
The flux evolves from F(t) = 5 x 10S erg cm’ sat t= 7,+ 80s to 
F(t)=6x10™ erg cms ‘at t2 Ty + 10°, after which it falls below the 
sensitivity level of the telescopes and is undetectable. There is no clear 
evidence for breaks or cutoffs inthe light curve, nor irregular variability 
beyond the monotonic decay. The light curves in the kiloelectronvolt 
and gigaelectronvolt bands display behaviour similar to the teraelec- 
tronvolt band, with a somewhat shallower decay slope for the gigae- 
lectronvolt band (Fig. 1). These properties indicate that most of the 
observed emission is associated with the afterglow phase, rather than 
the prompt phase, which typically shows irregular variability. We note 
that although the measured 7, is as long as about 360 s, the kiloelec- 
tronvolt-megaelectronvolt emission does not exhibit clear temporal or 
spectral evidence for a prompt componentafter about 7, + 25s (ref. 7°; 
Methods). Nevertheless, a sub-dominant contribution to the terae- 
lectronvolt emission from a prompt componentat later times cannot 
be excluded. The flux initially observed at t= 7, + 80s corresponds to 
an apparent isotropic-equivalent luminosity of L;,, = 3 x 10” erg s‘ at 
€=0.3-1 TeV, making this the most luminous source known at these 
energies. 

The power radiated in the teraelectronvolt band is comparable, 
within a factor of about 2, to that in the soft-X-ray and gigaelectron- 
volt bands during the periods when simultaneous teraelectronvolt— 
kiloelectronvolt or teraelectronvolt-gigaelectronvolt data are avail- 
able (Fig. 1). The isotropic-equivalent energy radiated at €= 0.3-1 TeV, 
integrated over the time period between 7, + 62s and 7, + 2,454, is 
Fo3-11ev4 * 10" erg. This is alower limit to the total teraelectronvolt-band 
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Fig. 2| Spectrum above 0.2 TeV averaged over the period between 7, + 62s 
and T,+2,454s for GRB190114C. Spectral-energy distributions for the 
spectrum observed by MAGIC (grey open circles) and the intrinsic spectrum 
corrected for EBL attenuation” (blue filled circles). The errors on the flux 
correspond to one standard deviation. The upper limits at 95% confidence level 
are shown for the first non-significant bin at high energies. Also shown is the 
best-fit model for the intrinsic spectrum (black curve) when assuming a power- 
law function. The grey solid curve for the observed spectrum is obtained by 
convolving this curve with the effect of EBL attenuation. The grey dashed curve 
is the forward-folding fit to the observed spectrum with a power-law function 
(Methods). 


output, as it does not account for data before T, + 62s or potential emis- 
sion at €>1TeV. From the megaelectronvolt-gigaelectronvolt data, the 
power-law decay phase is inferred to start at about 7, + 6s (refs. 7°’). 
Assuming that the MAGIC light curve evolved as F(t) « ¢7 after that 
time, the teraelectronvolt-band energy integrated between 7,+ 65s 
and 7, + 2,454 Sis Fo 3, :tey = 2 x 10” erg. This would be about 10% of the 
E,,o value measured by Fermi-GBM at € = 1-10* keV. 

Figure 1 also shows the time evolution of the intrinsic spectral photon 
index d,,,, determined by fitting the EBL-corrected, time-dependent 
differential photon spectrum with the power-law function dF /de ~ e“i", 
Considering the statistical and systematic errors (Methods), there is 
no significant evidence for spectral variability. Throughout the obser- 
vations, the data are consistent with a, ~—2, indicating that the radiated 
power is nearly equally distributed in € over this band. 

Figure 2 presents both the observed and the EBL-corrected intrinsic 
spectra above 0.2 TeV, averaged over (7, + 62S, T, + 2,454 s). The 
observed spectrum can be fitted in the energy range 0.2-1 TeV witha 
simple power law with photon index a@,,, =—5.43 + 0.22 (statistical error 
only), one of the steepest spectra ever observed for a y-ray source. It 
is remarkable that photons are observed at € ~ 1 TeV (Extended Data 
Table 2), despite the severe EBL attenuation expected at these energies 
(by a factor of about 300, according to plausible EBL models; see Meth- 
ods). Assuming a particular EBL model”, the intrinsic spectrum is well 
described as a power law with ay, =— 2.22°033 (statistical error only), 
extending beyond 1 TeV at 95% confidence level with no evidence for 
aspectral break or cutoff (Methods). Adopting other EBL models leads 
to only small differences in a,;,,, which are within the uncertainties 
(Methods). Consistency with q,,, ~ -2 implies a roughly equal power 
radiated over 0.2-1 TeV and possibly beyond, strengthening the infer- 
ence that there is substantial energy output at teraelectronvolt 
energies. 

Much of the observed emission up to gigaelectronvolt energies for 
GRB 190114C is probably afterglow synchrotron emission from elec- 
trons, similar to that of many previous GRBs””*. The teraelectronvolt 
emission observed here is also plausibly associated with the afterglow. 
However, it cannot be a simple spectral extension of the electron syn- 
chrotron emission. The maximum energy of the emitting electrons 
is determined by the balance between their energy losses, which are 
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Fig. 3 | Distribution of the number of teraelectronvolt-band y-rays in time 
and energy for GRB190114C. The number of events in each bin of energy and 
time are colour-coded (Methods). The vertical line indicates the beginning of 
the data acquisition. The curves show the expected maximum photon energy 
Esyn,max Of electron synchrotron radiation in the standard afterglow theory for 
two extreme cases giving high values of €,),, max- Fhe dotted curve corresponds 
to anisotropic-equivalent blast-wave kinetic energy of F,, 4. =3 x 10° erganda 
homogeneous external medium with density n =0.01cm™; the dashed curve 
corresponds to E,. 4,=3 x 10° erg and an external medium describinga 
progenitor stellar wind witha density profile of n(R) =AR~ asa function of 
radius R, where A =3 x 10°? cm‘ (Methods). 


dominated by synchrotron radiation, and their acceleration. The time- 
scale of the latter should not be much shorter than that of their gyra- 
tion around the magnetic field at the external shock. The energy of 
afterglow synchrotron photons is then limited to a maximum value, the 
so-called synchrotron burnoff limit”””° of €,,, max * 100(/,/1,000) GeV, 
which depends only onthe bulk Lorentz factor /),. The latter is unlikely 
to considerably exceed /,, ~ 1,000 (Methods). Figure 3 compares the 
observed photon energies with expectations of €,,,, ma, under different 
assumptions. Although a few y-rays with energy approaching €.,,, max 
have been previously detected froma GRB by Fermi”, the evidence for 
a separate spectral component was not conclusive, given the uncertain- 
ties in/;,, the electron acceleration rate and the spatial structure of the 
emitting region”. Here, even the lowest-energy photons detected by 
MAGIC are considerably above €.),, max and extend beyond 1 TeV at 95% 
confidence level (Methods). Thus, this observation provides the first 
unequivocal evidence for anew emission component beyond synchro- 
tron emission in the afterglow of a GRB. Moreover, this component is 
energetically important, with a power nearly comparable to that of the 
synchrotron component observed contemporaneously. 

Comparing with previous MAGIC observations of GRBs, the fact 
that GRB 190114C was the first to be clearly detected may be due toa 
favourable combination of its low redshift and suitable observing condi- 
tions, rather than its intrinsic properties being exceptional (Methods), 
although firm conclusions cannot yet be drawn with only one positive 
detection. The capability of the telescopes to react fast and operate 
during moonlight conditions was crucial in achieving this detection. 

The discovery of an energetically important emission component 
beyond electron synchrotron emission that may be common in GRB 
afterglows offers important new insight into the physics of GRBs. The 
similarity of the radiated power and temporal decay slopes in the terae- 
lectronvolt and X-ray bands suggests that this component is intimately 
related to the electron synchrotron emission. Promising mechanisms 
for the teraelectronvolt emission are ‘leptonic’ processes in the after- 
glow such as inverse Compton radiation, in which the electrons in 
the external shock Compton-scatter ambient low-energy photons to 
higher energies’ “. Onthe other hand, ‘hadronic’ processes induced by 
ultrahigh-energy protons in the external shock’””’ may also be viable 
ifthe acceleration of electrons and protons occurs ina correlated man- 
ner. However, such processes typically have low radiative efficiency, 
and are not favoured as the origin of the luminous teraelectronvolt 


emission observed in GRB190114C for cases such as proton synchrotron 
emission (Methods). Continuing efforts with existing and future y-ray 
telescopes will test these expectations and provide further insight into 
the physics of GRBs and related issues. 
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Methods 


General properties of GRB 190114C 

GRB 190114C was first identified by the Swift-BAT“ and Fermi-GBM» 
instruments on 14 January 2019, 20:57:03 UT. Subsequently, it was 
also detected by several other space-based instruments, including 
Fermi-LAT, INTEGRAL/SPI-ACS, AGILE/MCAL, Insight/HXMT and 
Konus-Wind’””’. Its redshift was reported as z = 0.4245 + 0.0005 by 
the Nordic Optical Telescope’’ and confirmed by Gran Telescopio 
Canarias”. The measured duration of T,, = 116 s by Fermi-GBM and 
Too ~ 362 s by Swift-BAT”® puts GRB 190114C unambiguously in the 
long-duration subclass of GRBs’. The fluence and peak photon flux of 
the emission at 10-1,000 keV during 7,, measured by Fermi-GBM are 
(3.990 + 0.008) x10“ erg cm and (246.86 + 0.86) cm”s (ref. 5). The 
corresponding isotropic equivalent energy and luminosity at 1-10* keV 
are£,,,~3 10” ergand1;,,~1x 10" ergs‘, respectively”*. These values 
are consistent with the known correlations between the spectral peak 
energy Epeax and E;,, (ref. *”) and between ,.., and L;,, (ref. **) for GRBs. 
The light curve of the kiloelectronvolt-megaelectronvolt emission 
exhibits two prominent emission episodes with irregular multi-peaked 
structure at t~O0-5Ss and t= 15-25 s (Extended Data Fig. 1). The spectra 
for these episodes are typical of GRB prompt emission”®. On the other 
hand, at t= 15-25 s and t> 25s, the temporal and spectral properties 
of the kiloelectronvolt-megaelectronvolt emission are consistent 
with an afterglow component, indicating a considerable overlap in 
time between the prompt and afterglow phases. Indeed, from a joint 
spectral and temporal analysis of the Fermi-GBM and Fermi-LAT data, 
the onset of the afterglow for GRB 190114C was estimated to occur at 
t~6s, much earlier than 7,, (ref. ”°). 

The event is fairly energetic but not exceptionally so, with £,,, lying 
in the highest ~30% of its known distribution**. No neutrinos were 
detected by the IceCube Observatory in the energy range 100 TeV to 
10 PeV, under non-optimal observing conditions®. 


MAGIC telescopes and automatic alert system 

The MAGIC telescopes comprise two 17-m diameter imaging atmos- 
pheric Cherenkov telescopes (IACTs; MAGIC-I and MAGIC-II) operating 
in stereoscopic mode, located at the Roque de los Muchachos Observa- 
tory inLa Palma, Canary Islands, Spain”. By imaging Cherenkov light 
from extended air shower events, the telescopes can detect y-rays above 
an energy threshold of 30 GeV, depending on the observing mode and 
conditions, with a field of view of ~10 square degrees. 

Observing GRBs with IACTs such as those of MAGIC warrants a dedi- 
cated strategy. Because IACTs have a low probability of discovering 
GRBs serendipitously in their relatively small field of view, they rely on 
external alerts provided by satellite instruments with larger fields of 
view to trigger follow-up observations. Since their inception, the MAGIC 
telescopes were designed to perform fast follow-up observations of 
GRBs. By virtue of their light-weight reinforced-carbon-fibre structure 
and high repositioning speed, they can respond quickly to GRB alerts 
received via the Gamma-ray Coordinates Network (GCN; https://gen. 
gsfc.nasa.gov)**. After various updates to the entire system over the 
years*°”!, the telescopes can currently slew to a target with a reposi- 
tioning speed of 7° s“. To achieve the fastest possible response to GRB 
alerts, an automatic alert system (AAS) has been developed, whichis a 
multi-threaded programme that performs different tasks, suchas con- 
necting to the GCN servers, receiving GCN notices that contain the sky 
coordinates of the GRB and sending commands tothe Central Control 
(CC) software of the MAGIC telescopes. This also includes a check of the 
visibility of the new target according to predefined criteria. A priority 
list has been set up for cases in which several different types of alerts are 
received simultaneously. Moreover, if there are multiple alerts for the 
same GRB, the AAS selects the one with the best localization. 

If an alert is tagged as observable by the AAS, the telescopes auto- 
matically repoint to the new sky position. An automatic procedure, 


implemented in 2013, prepares the subsystems for data taking during 
the telescope slewing””**: data taken previously are saved, relevant 
trigger tables are loaded, appropriate electronics thresholds are 
set and the mirror segments are suitably adjusted by the Automatic 
Mirror Control hardware. While moving, the telescopes calibrate the 
imaging cameras. The data acquisition system continues taking data 
while it receives information about the target from the CC software. 
The presence of a trigger limiter set to 1 kHz prevents high rates and 
the saturation of the data acquisition system. When the reposition- 
ing has finished, the target is tracked in wobble mode, which is the 
standard observing mode for MAGIC”. The fastest so far GRB follow- 
up was achieved for GRB 160821B, when the data taking started only 
24s after the GRB. 


MAGIC observations of GRB 190114C 

On the night of 14 January 2019, at 20:57:25 UT (T, + 22 s), Swift- 
BAT distributed an alert reporting the first estimated coordinates 
of GRB 190114C (right ascension, +03 h 38 min 02 s; declination, 
-26 d 56 min 18 s). The AAS validated it as observable and triggered 
the automatic repointing procedure, and the telescopes began slew- 
ing in fast mode from their position before the alert. The MAGIC-I and 
MAGIC-II telescopes were on target and began tracking GRB 190114C 
at 20:57:52.858 UT and 20:57:53.260 UT (7, + 50s), respectively, starting 
froma zenith angle of 55.8° and an azimuth angle of 175.1° in local coor- 
dinates. After starting the slewing, the telescopes reached the target 
position in approximately 27 s, moving by 42.82° in zenith and 177.5° 
in azimuth. At the end of the slewing, the cameras on the telescopes 
oscillated for a short time. Subsequently, we performed dedicated 
tests that reproduced the movement of the telescopes. We verified 
that the duration of the oscillations was less than 10 s after the start of 
the tracking, and their amplitude was less than 0.6’ when data taking 
began. Data acquisition started at 20:58:00 (T, + 57s) and the data 
acquisition system was operating stably from 20:58:05 (Ty + 62s), as 
denoted in Extended Data Fig. 1. 

Observations were performed inthe presence of moonlight, implying 
arelatively high night sky background (NSB), approximately 6 times 
the level for dark observations (moonless nights with good weather 
conditions)*°. Data taking for GRB 190114C stopped on 15 January 
2019, 01:22:15 UT, when the target reached a zenith angle of 81.14° and 
an azimuth angle of 232.6°. The total exposure time for GRB 190114C 
was 4.12 h. 


MAGIC data analysis for GRB 190114 
Data collected from GRB 190114C were analysed using the standard 
MAGIC analysis software” and with the analysis chain tuned for data 
taken under moonlight conditions*®. No detailed information onthe 
atmospheric transmission was available because the LIDAR facility“ was 
not operating during the night of the observation. Therefore, the quality 
of the data was assessed by checking other auxiliary weather-monitoring 
devices, as well as the value and stability of the data acquisition rates. 

A dedicated set of Monte Carlo simulation y-ray data was produced 
for the analysis, matching the trigger settings (discriminator thresh- 
olds), the zenith-azimuth distribution and the NSB level of the GRB 
190114C observations. The final dataset comprises events starting from 
20:58:05 UT. Owing to the higher NSB compared to standard analysis, 
ahigher level of image cleaning was applied to both the measured and 
the Monte Carlo data, anda higher cut on the integrated charge of the 
event image, set to 80 photoelectrons, was used for evaluating photon 
fluxes*®. The significance of the y-ray signal was computed using the 
Li& Ma method”. 

The spectra in Fig. 2 were derived by assuming a simple power-law 
function for the intrinsic spectrum 
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with the forward-folding method to derive the best-fit parameters 
and the Schmelling unfolding prescription for the spectral 
points*’, starting from the observed spectrum and correcting 
for EBL attenuation with the model of Dominguez et al.”°. The 
best-fit values are aq, =— 2.22°033(statistical)"), (systematic) and 
Se [8.45*9-°8 (statistical)"5'47(systematic)] x 10°? TeVicms7 at 
0.46 TeV. We note that owing to the soft spectrum of the source, the 
systematic errors reported here are larger than those given in ref. 7. 

The absolute energy scale for MAGIC measurements is systematically 
affected by the imperfect knowledge of different aspects, such as the 
atmospheric transmission, the mirror reflectance and the properties of 
photomultipliers. A dedicated study” identified the light-scale matching 
of measured and Monte Carlo data as the mostimportant contribution 
to the systematic errors onthe absolute energy scale. A miscalibration 
of the Monte Carlo energy scale can lead to mis-reconstruction of the 
spectrum that affects both the flux and the spectral shape, especially at 
the lowest energies. These studies demonstrated that the reconstructed 
spectra for MAGIC are affected by asystematic error due to the variation 
of the light scale by less than +15%. In the case of moonlight observa- 
tions, additional systematic effects on the flux arise from mismatches 
between Monte Carlo and measured data, in particular of the trigger 
discriminator thresholds and of the higher noise in the photomultipli- 
ers. Dedicated studies for moonlight observations” reveal that these 
errors affect only the overall flux (and not the spectral index) and depend 
on the NSB level. The contribution to the systematic error from the 
moonlight observations is minor compared to that due to the light- 
scale variations. Moreover, in the case of GRB 190114C, the influence 
of moonlight conditions on the overall systematic errors is mitigated 
by the improved data-Monte Carlo agreement achieved by simulating 
the recorded trigger discriminator thresholds and NSB during the GRB 
190114C observation. For the analysis of the GRB 190114C data, we repro- 
duced the effect of the light-scale variations on the spectra to derive the 
systematic errors on the energy flux and the errors on the photon index 
reported in Extended Data Table 1. The light-scale modifications were 
applied to the spectra before their deconvolution with EBL attenuation, 
which ultimately affects the low- and high-energy ends of the spectra 
in different ways. The fit to the obtained curves was performed in the 
same manner as the nominal case. Finally, the systematic errors were 
obtained from the difference of the parameter values computed for 
the nominal case and for the cases of light-scale variations by +15%. 

Anadditional systematic effect originates from uncertainties in exist- 
ing EBL models. To quantify the corresponding systematic errors on 
the derived photon indices, the observed spectra were corrected by 
adopting several EBL models** * for the redshift of this GRB. The results 
can be found in Extended Data Table 4. The spectral indices inferred 
using different EBL models differ less than their statistical uncertainties 
(one standard deviation). Taking as reference the EBL model of 
Dominguez et al.”, the spectral index for the time-integrated spectrum 
has an additional systematic error due to uncertainties inthe EBL such 
that an, = — 2.22°92%( statistical) "934 (systematic)’)~” (systematic, ,, )- 
The observed spectrum in the 0.2-1.0 TeV energy range can be roughly 
described by a power law with photon index a, =—5.43 + 0.22 (statis- 
tical) and flux normalization fo, = [4.09 + 0.34 (statistical)] x 
10° TeV‘ cms‘ at 0.475 TeV. 

The upper limit for the first non-significant energy bin in the 
observed spectrum shown in Fig. 2 is calculated from a likelihood ratio 
test between two models. The first, baseline, model considers only 
background events and spillover events from lower energy. The sec- 
ond model additionally assumes that the spectrum extends to higher 
energy as an unbroken power law, with the flux normalization as a free 
parameter. Given the low event statistics in the higher-energy bins, the 
validity of the upper limit was checked by performing 10,000 Monte 
Carlo simulations of the likelihood ratio test. The test statistic distri- 
bution derived from this toy simulation was then used to determine 
the upper limit on the flux at 95% confidence level. The corresponding 


upper limit for the intrinsic spectrum was derived from that for the 
observed spectrum by correcting for EBL attenuation. 

The time-dependent, EBL-corrected energy flux values shown in Fig.1 
and reported in Extended Data Table 1 were computed with an analytical 
procedure. For each time bin, the value of the energy flux was computed 
as the integral between 0.3 and 1 TeV of the best-fit spectral power-law 
function derived with the forward-folding method. Accordingly, the 
errors were calculated analytically through standard procedures for 
error propagation, taking into account the covariance matrix. Moreo- 
ver, the analytical results were checked against those computed witha 
toy Monte Carlo simulation, which gave comparable results. 

The lower limits on the maximum event energy were computed by 
an iterative procedure in which a power-law model was assumed for 
the intrinsic spectrum anda different cut was applied tothe maximum 
event energy for each iteration. For each value of the energy cut, a 
forward-folding fit was performed and ax’ value was obtained. The final 
result was obtained by finding the value of the energy cut for which the 
x’ variation corresponded toa given confidence level, set here to 95%. 

The number of events in each time and energy bin shown in Fig. 3 
was computed using the forward-folding EBL-corrected spectrum, the 
instrument effective area and the effective time of the observation. For 
the highest-energy bins, the corresponding numbers for the time inter- 
val between 7, + 62s and 7, +1,227 sare listed in Extended Data Table 2. 

The number of observed excess events in bins of estimated energy 
are reported in Extended Data Table 3. Also listed are the expected 
number of photons inthe same energy bins, obtained from the power- 
law model of the intrinsic spectrum by convolving it with the effect of 
EBL attenuation and the instrument response function for the zenith 
angles of this observation. We note that the counts in bins of estimated 
energy cannot be used to derive physical inferences. Spectral informa- 
tion that is physically meaningful must be computed as a function of 
the true energy of the events through an unfolding procedure using 
the energy migration matrix. Figure 2 shows such unfolded spectra 
(both intrinsic and observed) as a function of the true event energies. 


Fermi-LAT data analysis for GRB 190114C 

The publicly available Pass 8 (P8R3) LAT data for GRB 190114C were 
processed using the Conda Fermitools v1.0.2 package, distributed by 
the Fermi collaboration (https://fermi.gsfc.nasa.gov/ssc/data/analysis/ 
software/). Events of the ‘Transient’ class (P8R3._ TRANSIENTO20_ V2) 
were selected within 10° from the source position. We assumed a power- 
law spectrum in the 0.1-10 GeV energy range, also accounting for the 
diffuse Galactic and extragalactic backgrounds, as described in the 
analysis manual (https://fermi.gsfc.nasa.gov/ssc/data/analysis/sci- 
tools/). To compute the source fluxes, we first checked that the spectral 
index was consistent with —2 for the entire 62-180 s interval after 7), and 
then repeated the fit, fixing the index to this value. The LAT energy flux 
shown in Fig. 1 was computed as the integral of the best-fit power-law 
model within the corresponding energy range. 


XRT light curve 

The XRT light curve shown in Fig. 1 was derived using the online analysis 
tool that is publicly available at the Swift-XRT repository (http://www. 
swift.ac.uk/xrt_curves/). The spectral data collected in the ‘windowed 
timing’ mode suffered from an instrumental effect, causing a non- 
physical excess of counts below -0.8 keV (ref. *”). To remove this effect, 
we considered the best-fit model of spectral data above 1 keV and esti- 
mated a conversion factor from the number of counts to deabsorbed 
flux equal to 10° erg cm” per count. To obtain the energy-flux light 
curve, we applied this conversion factor to the count rate as a function 
of time in the interval 62-2,000s. 


Synchrotron burnoff limit for the afterglow emission 
GRB afterglows are triggered by external shocks that decelerate and 
dissipate their kinetic energy in the ambient medium, consequently 


producing a nonthermal distribution of electrons via mechanisms such 
as shock acceleration”. The maximum energy of electrons that canbe 
attained in the reference frame comoving with the post-shock region 
can be estimated by equating the timescales of acceleration, T,,,, and 
energy loss, T,,;.; the latter is primarily due to synchrotron radiation”. 
These are expected to scale with the electron Lorentz factor, y, andthe 
magnetic field strength, B, aS Tace < YB and Tio55 * VY 'B *, So that the 
maximum electron Lorentz factor is Yynax * BY”. Thus, the maximum 
energy of synchrotron emission €.),, max * Bynes is independent of B. Its 
numerical value in the shock comoving frame is €¢y, max ~ 50 — 100 MeV, 
which is determined only from fundamental constants and a factor of 
order 1 that characterizes the uncertainties in the acceleration time- 
scale. The observed spectrum of afterglow synchrotron emission is 
then expected to display a cutoff below the energy 
Esyn,max ~ LOO MeV ~ [/7,(t)/(1 + z)], which depends only on the time- 
dependent bulk Lorentz factor [,(¢) of the external shock. To estimate 
Esynmax ANd its evolution, we use the /,,(¢) values derived from solutions 
to the dynamical equations of the external shock“. The resulting curves 
fOr Exynmax are Shown for cases of a medium with constant density 
(n=constant) anda medium witha radial density profile of n(R) =AR~ 
(with A =3 x 10*A. cm”, where A. is a parameter characterizing the 
normalization of the density), expected when a dense stellar wind is 
produced by the progenitor star (dotted and dashed lines in Fig. 3, 
respectively). These curves have been derived assuming small values 
for the density (n = 0.01 and A« = 0.01) and the efficiency of prompt 
emission (7,=1%), which imply alarge value for the isotropic-equivalent 
blast-wave kinetic energy (EF; ar = Fiso(1- 7)/n,), resulting in high values 
Of E.yn.max- Even with such extreme assumptions, the energy of photons 
detected by MAGIC are well above €,,, max (Fig. 3). 


Constraints on proton synchrotron afterglow emission 
Synchrotron emission by protons accelerated to ultrahigh energies in 
the external shock has been proposed as a mechanism for gigaelectron- 
volt-teraelectronvolt emission in GRB afterglows, potentially at ener- 
gies above the burnoff limit for electron synchrotron emission’®??"°°, 
We discuss whether this process provides a viable explanation for the 
teraelectronvolt emission observed here, following the formulation of 
ref.”. For the case of a uniform external medium with density n=n,cm”®, 
the maximum expected energy of proton synchrotron emission inthe 
observer frame is 


Epsyn,max = (7.6 GeV) iq €8/"( no Ey, 53)° tts! 4(1+ zy 9/4 (1) 


where EF, 4, = 10°F,,53 erg, ¢, is the observer time after the burst in sec- 
onds, €, is the fraction of energy in magnetic fields relative to that dis- 
sipated behind the shock, and isa factor of order 1 that characterizes 
the acceleration timescale. Even when assuming optimistic values of 
€,= 0.5 and 7=1, realizing E,.ynmax2 1 TeV att~ 100s fora GRB atz=0.42 
requires noE,;; 2 10*, which is a very high value for the product of the 
blastwave energy and the external medium density. 

Even more severe is the requirement to reproduce the observed 
teraelectronvolt flux and spectrum. Assuming a power-law energy dis- 
tribution with index —p for the accelerated protons, their synchrotron 
emission is expected to havea single power-law spectrum with photon 
index Qn: = —(p + 1)/2, extending from a minimum energy 


Em= (3.7 x10? eV)E eFey EY 305771 +z)? (2) 
with differential energy flux 


f(€= Eq) = (1.3 x 108 erg em s*Hz 1) x fh 7nY7E, 53D 99(1+Z) (3) 


UP tO €= Epeynmax Where &, is the fraction of the number of protons swept 
up by the shock that are accelerated, ¢, is the fraction of the energy of 
the accelerated protons relative to that dissipated behind the shock, 


and D=10"8D,, cmis the luminosity distance of the GRB. The observed 
intrinsic spectral index @,,,~—2 at t~100 simplies p ~3. If p=3 andthe 
spectrum extends to €=1 TeV without a cutoff, the energy flux at 1 TeV is 
=(1.1x 10° erg cm? s"}) 


2e-L, p1/2p3/2 y-24-3/2 3/2 
x€ns €th Ep 2Doats (1+ 2) 


F(e=1TeV) 4) 


With optimistic assumptions of €, = 0.5, 7 =1, €, = 0.5 and €, = 0.1, 
accounting for the observed 0.3-1 TeV flux at t = 100 s of 
F=4x10%ergcm”s ‘necessitates nj7Ej/2, 2 10". Even in the extreme 
case of aGRB occurring at the centre of a dense molecular cloud with 
n=10° cm’, the blastwave energy must be E,,,, > 2 x 10° erg, greatly 
exceeding the energy available for any plausible GRB progenitor’. This 
conclusion is qualitatively valid regardless of how the electron syn- 
chrotron emission is modelled or whether the external medium has a 
density profile characteristic of a progenitor stellar wind. Although 
protonsynchrotron emission may possibly explain the gigaelectronvolt 
emission observed in some GRBs”, it is not favoured as the origin of 
the luminous teraelectronvolt emission observed in GRB 190114C, 
owing to its low radiative efficiency. A more plausible mechanism may 
be inverse Compton emission by accelerated electrons?" 


Past teraelectronvolt-band observations of GRBs with MAGIC 
and other facilities 

Although the search for teraelectronvolt y-rays from GRBs has contin- 
ued over many years using a variety of experimental techniques, no 
clear detections had been previously achieved” °°. Designed with the 
primary goal of GRB follow-up observations, MAGIC has been respond- 
ing to GRBalerts since 15 July 2004. For the first five years, MAGIC oper- 
ated as a single telescope (MAGIC-I), reacting mainly to alerts from 
Swift. After the second telescope (MAGIC-II) was added in 2009, GRB 
observations have been carried out in stereoscopic mode. Excluding 
cases when useful data could not be taken owing to hardware problems 
or adverse weather conditions, 105 GRBs were observed from July 2004 
to February 2019. Of these, 40 have determined redshifts, among which 
8 and3 have redshifts lower than 1 and 0.5, respectively. Observations 
started less than 30 min after the burst for 66 events (of which 33 lack 
redshifts) and less than 60s for 14 events; the small number of events in 
the latter case is mainly due to bad weather conditions or observational 
criteria not being fulfilled at the time of the alert. 

Despite 15 years of dedicated efforts, no unambiguous evidence for 
y-ray signals from GRBs had been seen by MAGIC before GRB190114C. 
The flux upper limits for GRBs observed in 2005-2006 were found to 
be consistent with simple power-law extrapolations of their low-energy 
spectra when EBL attenuation was taken into account™. More detailed 
studies were presented for GRB 080430® and GRB 090102%, which 
were observed simultaneously with MAGIC and other instruments 
in different energy bands. Since 2013, GRB observations have been 
performed with the new automatic procedure described above*”*’. 
In addition, for some bright GRBs detected by Fermi-LAT, late-time 
observations have been conducted up to one day after the burst to 
search for potential signals extended in time. 

The case of GRB190114C can be compared with other GRBs followed 
up by MAGIC under similar conditions. Aside from the intrinsic spec- 
trum, the main factors affecting the detectability of aGRB by IACTs are 
the redshift z (stronger EBL attenuation for higher z), the zenith distance 
(higher energy threshold for higher zenith distance), the external light 
conditions and the delay time 7,.;,, between the GRB and the beginning 
of the observations. If we select GRBs with z< Land 7ye1ay < 1h, only 
four events remain, as listed in Extended Data Table 5. Except for GRB 
190114C, these are all short GRBs, which is not surprising as they are 
known to be distributed at redshifts appreciably lower than those of 
long GRBs”. A few other long GRBs with z< and Tye) < 1h were fol- 
lowed up by MAGIC, but the observations were not successful owing 
to technical problems or adverse observing conditions. There is also 
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a fair fraction of events without measured redshifts. Assuming that 
they follow the known z distribution of long GRBs, ~20% of the events 
are expected at z<1 (ref. ©). Since 30 long GRBs without redshifts were 
observed by MAGIC with 74.14, < 1h, only a few events with observing 
conditions and z similar to that of GRB 190114C are expected to be 
observed during the whole MAGIC GRB campaign. 

A similar analysis for past GRBs observed by other Cherenkov tel- 
escopes is not possible, because not all of the relevant ancillary infor- 
mation is available. However, summaries of past efforts have been 
reported. Of the 150 GRBs followed up by VERITAS until February 2018%, 
50 had observations starting within 180 s from the satellite trigger time. 
H.E.S.S. also conducted several tens of GRB follow-up observations 
until 2017°°°. 64 GRBs were observed by HAWC until February 2017. 
Milagrito and Milagro observed 54 GRBs from February 1997 to May 
1998” and more than 130 GRBs from January 2000 to March 2008, 
respectively”. None of these considerable observational efforts pro- 
vided any convincing detection, althoughsome hints at low significance 
have been found. A case of particular interest was the Milagrito result 
for GRB 970417A™, although its statistical significance was not high 
enough to fully rule out a background event. 


Data availability 


Raw data were generated at the MAGIC telescopes large-scale facility. 
Derived data supporting the findings of this study are available from 
the corresponding authors upon request. Source data for Figs. 1-3 are 
provided with the paper. 
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line). The errors onthe MAGIC photon fluxes correspond to one standard 
deviation. Vertical lines indicate the times when the alert was received 

(Ty +22 s) by MAGIC, when the tracking of the GRB by the telescopes started 
(Ty +50s), when the data acquisition started (7) + 57s), and when the data 
acquisition system (DAQ) became stable (Ty + 62s; dotted line). 


Extended Data Fig. 1| Light curves inthe teraelectronvolt and 
kiloelectronvolt bands for GRB190114C. Photon flux light curve above 

0.3 TeV measured by MAGIC (red; from 7) + 62s to 7) +210 s), compared with 
that between 15 keV and 50 keV measured by Swift-BAT” (grey; from 7, to 

T) + 210 s) and the photon flux above 0.3 TeV of the Crab Nebula (blue dashed 
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Extended Data Fig. 2 | Significance of the y-ray signal between 7, + 62s and 
T, + 1,227 s for GRB 190114C. Distribution of the squared angular distance, 6”, 
for the MAGIC data (points) and background events (grey shaded area). 6” is 
defined as the squared angular distance between the nominal position of the 
source and the reconstructed arrival direction of the events. The dashed 


Time = 0.32 h 


Significance (Li&Ma) = 51.40 


895;N = 17.64 1.9 
877.4 + 30.0 


6° [ deg® ] 
vertical line represents the value of the cut on @. This defines the signal region, 
where the number of events coming from the source (N,,,) and from the 
background (N,,) are computed. The errors for ‘on’ events are derived from 


Poissonian statistics. From N,, and Nor, the number of excess events (N,,) iS 
computed. The significance is calculated using the Li& Mamethod”. 
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Extended Data Table 1| Energy flux between 0.3 and 1 TeV in selected time bins for GRB 190114C 


Time bin Energy flux Spectral index 
[seconds after To ] [ergem-*s—'] 
62 — 100 [5.64 + 0.90 (stat) +324 (sys)]- 10-8 —-1.86 *9:38 (stat) +°.1? (sys) 
100 — 140 [3.31 + 0.67 (stat) #77) (sys)]-10~8 — -2.15 +943 (stat) *P (sys) 
140 — 210 [1.89 + 0.36 (stat) 1? (sys)]-10-8 — -2.31 +947 (stat) 19:1 (sys) 
210 — 361.5 [7.54 + 1.60 (stat) “P46 (sys)]-10-° — -2.53 +9.53 (stat) +9.2? (sys) 
361.5 — 800 [3.10 + 0.70 (stat) 429 (sys)]-10-° — -2.41 +98! (stat) +927 (sys) 
800 — 2454 [4.54 + 2.04 (stat) +788 (sys)]-10-1° -3.10 +087 (stat) 19:75 (sys) 
62 — 2454 (time integrated) - -2.22 1058 (stat) “2! (sys) 


Values listed correspond to the light curve in Fig. 1. For each time bin, columns represent the start and end time of the bin, the EBL-corrected energy flux in the 0.3-1 TeV range, and the best-fit 
spectral photon indices. The last row reports the value of the intrinsic spectral index for the time-integrated spectrum (Fig. 2). The reported statistical errors (stat) correspond to one standard 
deviation, whereas systematic errors (sys) are derived from the variation of the light scale by +15% (see Methods). 


Extended Data Table 2 | Number of y-rays from GRB 190114C in the highest-energy bins 


Emin [TeV] Emax [TeV] Model counts in [Emin; Emax] Significance above Emin 


0.71 1.10 25.4 5.8 
1.10 1.70 4.1 2.5 
1.70 2.64 0.9 1.5 
2.64 4.09 0.1 0.1 


The number of y-ray counts was estimated from the MAGIC data using the power-law spectral model for the time interval between T, +62 s and T,+1,227s. 
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Extended Data Table 3 | Observed and expected number of events in estimated-energy bins for GRB 190114C 


Eestmin [TeV] Eestmax [TeV] Observed photons Expected photons 


0.19 0.29 155+ 13 219 + 73 
0.29 0.46 598 + 26 564 + 53 
0.46 0.71 154+ 13 180 + 16 
0.71 1.10 3246 28+ 3 

1.10 1.70 6.0+2.9 5.6 + 0.4 
1.70 2.64 2.34 1.8 1.2+0.1 


The number of expected events is calculated from the intrinsic spectrum power-law model, by convolving it with the effect of EBL attenuation and the instrument response function of the 
telescope for these large zenith angles. The energy binning in estimated energy matches the one in true energy (after unfolding) shown in Fig. 2 and Extended Data Table 2. The large 
uncertainty in the number of expected events in the lowest-energy bin is dominated by the uncertainty in the very low effective area of the telescopes close to the energy threshold of this 
analysis. The numbers reported in this table cannot be used directly for any physical inference. The measured spectrum needs to be first unfolded using the energy migration matrix”. 


Extended Data Table 4 | Spectral indices for different EBL models 


Time bin Di1 FO8 Fl10 G12 


[seconds after To ] 


62 — 100 “1.86729 -2.04:08° -1.81772 -1.951729 
100 — 140 “2.1504 2.32058 -2.0948 -2.23'07 
140 — 210 23104 2A80e -2250e) . <2897 2 
210 — 361.5 2.5308 -2.69:087 -2.4608? -2.6008 
361.5 — 800 2.4108! -2.58:05! -2.3408 -2.49%7 3 
800 — 2454 B.1078" 3.20 =2.06058 <3.081) 2 


62 — 2454 (time integrated) -2.22%0.23  -2.391023 2.15023 9 99+0.28 


The abbreviations refer to the different EBL model adopted in each case. D11: Dominguez et al.”° (reported also in Extended Data Table 1); FO8: Franceschini et al.“; FI10: Finke et al.”°; G12: 
Gilmore et al.“°. The errors correspond to one standard deviation. 
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Extended Data Table 5 | List of GRBs observed under 
adequate technical and weather conditions by MAGIC with 
Z<1and Tyeiay<1h 


Event redshift Tuelay (S) Zenith angle (deg) 
GRB 061217 0.83 786.0 59.9 
GRB 100816A 0.80 1439.0 26.0 
GRB 160821B 0.16 24.0 34.0 
GRB 190114C 0.42 58.0 55.8 


The zenith angle at the beginning of the observations is reported in the last column. All GRBs 
except GRB 061217 were observed in stereoscopic mode. GRB 061217, GRB 100816A and GRB 
160821B are short GRBs, whereas GRB 190114C is a long GRB. Observations of a few other long 
GRBs with the same criteria were also conducted but are not listed here, because they were 
affected by technical problems or adverse observing conditions. 
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Long-duration y-ray bursts (GRBs) originate from ultra-relativistic jets launched from 
the collapsing cores of dying massive stars. They are characterized by an initial phase 
of bright and highly variable radiation in the kiloelectronvolt-to-megaelectronvolt 


band, which is probably produced within the jet and lasts from milliseconds to 
minutes, knownas the prompt emission’”. Subsequently, the interaction of the jet 
with the surrounding medium generates shock waves that are responsible for the 
afterglow emission, which lasts from days to months and occurs over a broad energy 
range from the radio to the gigaelectronvolt bands’ *. The afterglow emission is 
generally well explained as synchrotron radiation emitted by electrons accelerated by 
the external shock’ ’. Recently, intense long-lasting emission between 0.2 and1 
teraelectronvolts was observed from GRB190114C°".. Here we report multi- 
frequency observations of GRB190114C, and study the evolution in time of the GRB 
emission across 17 orders of magnitude in energy, from 5 x 10° to 10” electronvolts. 
We find that the broadband spectral energy distribution is double-peaked, with the 
teraelectronvolt emission constituting a distinct spectral component with power 
comparable to the synchrotron component. This component is associated with the 
afterglow andis satisfactorily explained by inverse Compton up-scattering of 
synchrotron photons by high-energy electrons. We find that the conditions required 
to account for the observed teraelectronvolt component are typical for GRBs, 
supporting the possibility that inverse Compton emission is commonly produced in 


GRBs. 


On14 January 2019, following an alert from the Neil Gehrels Swift Obser- 
vatory (hereafter Swift) and the Fermi satellite, the Major Atmospheric 
Gamma Imaging Cherenkov (MAGIC) telescopes observed and detected 
radiation up toat least 1 TeV from GRB190114C. Before the MAGIC detec- 
tion, GRB emission had only been reported at much lower energies, 
below 100 GeV, first by CGRO-EGRET and more recently by AGILE-GRID 
and Fermi-LAT (see ref. ? for a recent review). 

Detection of teraelectronvolt radiation opens a new window in the 
electromagnetic spectrum for the study of GRBs”. Its announcement” 
triggered an extensive campaign of follow-up observations. Owing to 
the relatively low redshift of z= 0.4245 + 0.0005 (Methods) of the GRB 
(corresponding to a luminosity distance of about 2.3 Gpc), a compre- 
hensive set of multi-wavelength data could be collected. We present 
observations gathered from instruments onboard six satellites and 15 
ground telescopes (radio, submillimetre, near-infrared (NIR), optical, 
ultraviolet (UV), and very-high-energy y-rays; see Methods) for the 
first ten days after the burst. The frequency range covered by these 
observations spans more than 17 orders of magnitude, from 1to about 
210” GHz, the most extensive so far fora GRB. The light curves of GRB 
190114C at different frequencies are shown in Fig. 1. 

The prompt emission of GRB190114C was simultaneously observed 
by several space missions covering the spectral range from 8 keV to 
about 100 GeV (Methods). The prompt light curve shows a complex 
temporal structure with several emission peaks (Methods, Extended 
Data Fig. 1), with a total duration of about 25 s (see dashed line in 
Fig. 1) and total radiated energy of £, ,.. = (2.5 + 0.1) x10” erg (isotropic 


equivalent; 1 erg=107J) inthe energy range 1-10* keV (ref. *). During the 
time of inter-burst quiescence, at t=5-15s, and after the end of the last 
prompt pulse, at t2 25s, the flux decays smoothly, following a power law 
of F~ tas a function of time t with a0-, 90xev = —1.10 + 0.01 (ref. *). The 
temporal and spectral characteristics of this smoothly varying com- 
ponent support an interpretation in terms of afterglow synchrotron 
radiation, making this one of the few clear cases of afterglow emission 
detected in the band 10-10* keV during the prompt-emission phase. The 
onset of the afterglow component is then estimated to occur around 
t~5-10s (refs. "), implying an initial bulk Lorentz factor between 
300 and 700 (Methods). 

After about one minute from the start of the prompt emission, two 
additional high-energy telescopes began observations: MAGIC and 
Swift-XRT. The XRT (1-10 keV; blue data points in Fig. 1) and MAGIC 
(0.3-1 TeV; green data points in Fig. 1) light curves decay withtime asa 
power law with decay indices of a, = -1.36+ 0.02 and @yey ~—1.51+ 0.04, 
respectively. The 0.3-1-TeV light curve shown in Fig. 1 was obtained 
after correcting for attenuation by the extragalactic background light 
(EBL)”°. The teraelectronvolt-band emission is observable until about 40 
min—much longer than the nominal duration of the prompt-emission 
phase. The NIR-optical light curves (square symbols) show a more 
complex behaviour. Initially, a fast decay is seen, where the emission is 
probably dominated by the reverse-shock component”. This is followed 
by a shallower decay, and subsequently a faster decay at t210°s. The 
latter may indicate that the characteristic synchrotron frequency v,, 
crosses the optical band (Extended Data Fig. 6), whichis not atypical, 
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Fig. 1| Multi-wavelength light curves of GRB 190114C. Energy flux at different 
wavelengths, from radio to y-rays, versus time after the BAT trigger, at 

T, =20:57:03.19 universal time (UT) on 14 January 2019. The light curve for the 
energy range 0.3-1 TeV (green circles) is compared with light curves at lower 
frequencies. Those for VLA (yellow square), ATCA (yellow stars), ALMA (orange 
circles), GMRT (purple filled triangle) and MeerKAT (purple open triangles) 
have been multiplied by 10” for clarity. The vertical dashed line marks 
approximately the end of the prompt-emission phase, identified as the end of 
the last flaring episode. For the data points, vertical bars showthe loerrors on 
the flux, and horizontal bars represent the duration of the observation. The 
fluxes inthe V, rand K filters (pink, purple and grey filled squares, respectively) 
have been corrected for extinction in the host and in our Galaxy; the 
contribution from the host galaxy has been subtracted. 


but usually occurs at earlier times. The relatively late time at which the 
break appears in GRB190114C would then imply avery large value of v,,, 
placing it in the X-ray band at about 10’ s. The millimetre light curves 
(orange symbols) also show an initial fast decay in which the emission 
is dominated by the reverse shock, followed by emission at late times 
with nearly constant flux (Extended Data Fig. 3). 

The spectral energy distributions (SEDs) of the radiation detected 
by MAGIC are shown in Fig. 2, where the whole duration of the emission 
detected by MAGIC is divided into five time intervals. For the first two 
time intervals, observations in the gigaelectronvolt and X-ray bands are 
also available. During the first time interval (68-110 s; blue data points 
and blue confidence regions), Swift-XRT, Swift-BAT and Fermi-GBM data 
show that the afterglow synchrotron component peaks in the X-ray 
band. At higher energies, up to 1 GeV, the SED is a decreasing function 
of energy, as supported by the Fermi-LAT flux between 0.1and 0.4 GeV 
(Methods). On the other hand, at even higher energies, the MAGIC flux 
above 0.2 TeV implies a spectral hardening. This evidence is independ- 
ent of the EBL model adopted to correct for the attenuation (Methods). 
This demonstrates that the newly discovered teraelectronvolt radiation 
isnot a simple extension of the known afterglow synchrotron emission, 
but a separate spectral component. 

The extended duration and the smooth, power-law temporal decay 
of the radiation detected by MAGIC (see green data points in Fig. 1) 
suggest an intimate connection between the teraelectronvolt emission 
and the broadband afterglow emission. The most natural candidate 
is synchrotron self-Compton (SSC) radiation in the external forward 
shock: the same population of relativistic electrons responsible for the 
afterglow synchrotron emission Compton up-scatters the synchrotron 
photons, leading to a second spectral component that peaks at higher 
energies. Teraelectronvolt afterglow emission can also be produced by 
hadronic processes, suchas synchrotron radiation by protons acceler- 
ated to ultrahigh energies in the forward shock” ’. However, owing 
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Fig. 2 | Multi-band spectra in the time interval 68-2,400s. Five time intervals 
are considered: 68-110 s (blue), 110-180 s (yellow), 180-360 s (red), 360-6255 
(green) and 625-2,400s (purple). MAGIC data points have been corrected for 
attenuation caused by the EBL. Data from other instruments (Swift-XRT, Swift- 
BAT, Fermi-GBM and Fermi-LAT) are shown for the first two time intervals. For 
each time interval, LAT contour regions are shown, limiting the energy tothe 
range in which photons are detected. MAGIC and LAT contour regions are 
drawn fromthe loerror of their best-fit power-law functions. For Swift data, the 
regions show the 90% confidence contours for the joint fit for XRT and BAT, 
obtained by fitting asmoothly broken power law to the data. Filled regions are 
used for the first time interval (68-110 s). 


to their typically low radiation efficiency®, reproducing the luminous 
teraelectronvolt emission observed here by such processes would imply 
unrealistically large power of accelerated protons”. Teraelectronvolt 
photons can also be produced via the SSC mechanism in internal shock 
synchrotron models of the prompt emission. However, numerical mod- 
elling (Methods) shows that prompt SSC radiation can account at most 
for alimited fraction (<20%) of the observed teraelectronvolt flux, and 
only at early times (ts 100s). Henceforth, we focus on the SSC process 
inthe afterglow. 

SSC emission has been predicted for GRB afterglows?"?"8”°-?”, How- 
ever, its quantitative significance has been uncertain because the SSC 
luminosity and spectral properties depend strongly on the poorly 
constrained physical conditions in the emission region (for example, 
the magnetic field strength). The detection of the teraelectronvolt 
component in GRB 190114C and the availability of multi-band obser- 
vations offer the opportunity to investigate the relevant physics at a 
deeper level. SSC radiation may have been already detected in very 
bright GRBs, such as GRB 1304274, in which photons with energies 
of 10-100 GeV are challenging to explain by synchrotron processes, 
suggesting a different origin?> °°. 

We model the full dataset (from the radio band to teraelectronvolt 
energies, for the first week after the explosion) as synchrotron plus SSC 
radiation, within the framework of the theory of afterglow emission 
from external forward shocks. The detailed modelling of the broad- 
band emission and its evolution with time is presented in Methods. 
We discuss here the implications for the emission at t< 2,400 s and 
energies above >1 keV. 

The soft spectra inthe 0.2-1-TeV energy range (photon index ly.) <—2; 
see Extended Data Table 1) constrain the peak of the SSC component 
to below this energy range. The relatively small ratio between the spec- 
tral peak energies of the SSC (E>*“<200 GeV ) and synchrotron 
(E;"~10 keV) components implies a relatively low value for the elec- 
tron Lorentz factor (y~2 x 10°). This value is hard to reconcile with the 
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Fig. 3 | Modelling of the broadband spectra in the time intervals 68-110 s and 
110-180 s. Thick blue curve, modelling of the multi-band data inthe 
synchrotronand SSC afterglow scenario. Thin solid lines, synchrotron and SSC 
(observed spectrum) components. Dashed lines, SSC when internal y-y 
opacity isneglected. The adopted parameters are: s=0,€,=0.07, €,=8 10>, 
p=2.6,n )=0.5and E,=8 x 10* erg; see Methods. Empty circles show the 
observed MAGIC spectrum, that is, uncorrected for attenuation caused by the 
EBL. Contour regions and data points areas in Fig. 2. 


observation of the synchrotron peak at energies higher than kiloelec- 
tronvolt. To explain the soft spectrum detected by MAGIC, it is neces- 
sary to invoke scattering in the Klein—-Nishina regime for the electrons 
radiating at the spectral peak, as well as internal y-y absorption”. 
Although both of these effects tend to become less important with 
time, the spectral index inthe 0.2-1-TeV band remains constant intime 
(or possibly evolves to softer values; Extended Data Table 1). This 
implies that the SSC peak energy moves to lower energies and crosses 
the MAGIC energy band. The energy at which attenuation by internal 
pair production becomes important indicates that the bulk Lorentz 
factor is about 140-160 at 100s. 

An example of the theoretical modelling in this scenario is shown 
in Fig. 3 (blue solid curve; see Methods for details). The dashed line 
shows the SSC spectrum when internal absorption is neglected. The 
thin solid line shows the model spectrum including EBL attenuation, 
in comparison to the MAGIC observations (empty circles). 

We find that acceptable models of the broadband SED can be obtained 
if the conditions at the source are the following. The initial kinetic 
energy of the blast wave is £, = 3 x 10° erg (isotropic-equivalent). The 
electrons swept up from the external medium are efficiently injected 
into the acceleration process and carry a fraction €, = 0.05—0.15 of the 
energy dissipated at the shock. The acceleration mechanism produces 
an electron population characterized by a non-thermal energy distri- 
bution, described by a power law with index p = 2.4-2.6, an injection 
Lorentz factor of y,, = (0.8-2) x 10* and a maximum Lorentz factor of 
Vinax ~ 108 (at about 100 s). The magnetic field behind the shock conveys 
a fraction ¢, ~ (0.05-1) x 10° of the dissipated energy. At t~100s, cor- 
responding toa distance from the central engine of R ~ (8-20) x 10° cm, 
the density of the external medium is n= 0.5-5 cm”? and the magnetic 
field strength is B ~ 0.5-5 G. The latter implies that the magnetic field 
was efficiently amplified from values of a few microgauss, which are 
typical of the unshocked ambient medium, owing to plasmainstabilities 
or other mechanisms*®. Not surprisingly, we find that ¢,> €,, whichis a 
necessary condition for the efficient production of SSC radiation'®”°. 


The blast-wave energy inferred from the modelling is comparable 
to the amount of energy released in the form of radiation during the 
prompt phase. The prompt-emission mechanism must then have dis- 
sipated and radiated no more than half of the initial jet energy, leaving 
the rest for the afterglow phase. The modelling of the multi-band data 
also allows us to infer how the total energy is shared between the syn- 
chrotron and SSC components. The resultant powers of the two compo- 
nents are comparable. We estimate that the energy inthe synchrotron 
and SSC component are about 1.5 x 10° erg and around 6.0 x10” erg, 
respectively, in the time interval 68-110 s, and about 1.3 x 10” erg and 
around 5.4 x10” erg, respectively, in the time interval 110-180 s. Thus, 
previous studies of GRBs may have been missing a substantial fraction 
of the energy emitted during the afterglow phase that is essential to 
its understanding. 

Finally, we note that the values of the afterglow parameters inferred 
fromthe modelling fall within the range of values typically inferred from 
broadband (radio to gigaelectronvolt) studies of GRB afterglow emis- 
sion. This points to the possibility that SSC emission in GRBs may bea 
relatively common process that does not require special conditions to 
be produced, and its power is similar to that of synchrotron radiation. 

The SSC component may then be detectable at teraelectronvolt 
energies in other relatively energetic GRBs, as long as the redshift is 
low enough to avoid severe attenuation by the EBL. This also provides 
support to earlier indications for SSC emission at gigaelectronvolt 
energies***°, 
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Methods 


Prompt-emission observations 

On14 January 2019, the prompt emission from GRB190114C triggered 
several space instruments, including Fermi-GBM”, Fermi-LAT, Swift- 
BAT**, Super-AGILE®®, AGILE-MCAL®, KONUS-Wind”, INTEGRAL-SPI- 
ACS” and Insight-HXMT*®. The prompt-emission light curves from 
AGILE, Fermi and Swift are shown in Fig. 1 and in Extended Data Fig. 1, 
where the trigger time 7, refers to the BAT trigger time (20:57:03.19 UT). 
The prompt emission lasts for approximately 25 s, when the last flaring- 
emission episode ends. Nominally, 7,, (that is, the time interval during 
whicha fraction between 5% and 95% of the total emission is observed) 
is much longer (>100 s, depending onthe instrument)", but it is clearly 
contaminated by the afterglow component (Fig. 1) and does not pro- 
vide a good measure of the actual duration of the prompt emission. A 
more detailed study of the prompt emission phase is reported in ref. *. 


AGILE 

AGILE (Astrorivelatore Gamma ad Immagini Leggero)” could observe 
GRB190114C until 7, + 330s, before it became occulted by the Earth. GRB 
190114C triggered the MCAL (Mini-CALorimeter) from T, — 0.95 s to 
T) + 10.95 s. The MCAL light-flux curve in Fig. 1 was produced using two 
different spectral models. From 7, - 0.95 s to 7, + 1.8 s, the spectrum is 
fitted by a power law with photon index [,,,= -1.97°954 (AN/dE = E'°*). 
From 7,+1.8sto T,+5.5s the best-fit model is abroken power law with 
Fyn. = ~ 1.872039), Iph,2 = ~ 2-63°0.0 and break energy £, = 756"}24 keV. 
The total fluence in the 0.4-100 MeV energy range is 
F=1.75 x 10 erg cm. The Super-AGILE detector also detected the 
burst, but the large off-axis angle prevented any X-ray imaging of the 
burst and any spectral analysis. Extended Data Fig. 1a, d, e shows 
the GRB 190114C light curves acquired by the Super-AGILE detector 
(20-60 keV) and by the MCAL detector in the low- (0.4-1.4 MeV) and 
high-energy (1.4-100 MeV) bands. 
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Fermi-GBM 

There are indications that at the time of the MAGIC observations some 
of the detectors were partially shadowed by the structural elements 
of the Fermi spacecraft that were not modelled in the response of the 
GBM (Gamma-ray Burst Monitor) detectors. This affects the low-energy 
part of the spectrum*®. For this reason, out of caution we elected to 
exclude the energy channels below SO keV. The spectra detected by 
Fermi-GBM“ during the intervals 7) +68sto 7)+110s and 7)+110sto 
T,) +180 s are best described by a power-law model with photon index 
yn = -2.10 + 0.08 and [,,, = -2.05 + 0.10, respectively (Figs. 2, 3). The 
10-1,000-keV light curve in Extended Data Fig. 1c was constructed by 
summing photon counts for the bright Nal detectors. 


Swift-BAT 

The 15-350-keV mask-weighted light curve of the BAT (Burst Alert Tel- 
escope)” shows a multi-peaked structure that starts at 7)-7s (Extended 
Data Fig. 1b). The 68-110 s and 110-180 s spectra shown in Figs. 2, 3 
were derived from a joint XRT-BAT fit. The best-fitting parameters 
for the whole interval (68-180 s) are: column density, 
Ny= (7.53707#) x 1072 cm at z= 0.42, in addition to the galactic value 
of 7.5 x 10” cm™; low-energy photon index, /,,,1= — 1.21°)3¢ ; high- 
energy spectral index, /,, .=- 2.197039. and peak energy E,>14.5 keV. 
Errors are given at 90% confidence level. 


Fermi-LAT 

Fermi-LAT (Large Area Telescope)” detected a y-ray counterpart since 
the prompt phase*. The burst left the LAT field of view at 7, +150s and 
remained outside it until 7, + 8,600 s. The light curve in the energy 
range 0.1-10 GeV is shown in Extended Data Fig. 1f. The LAT spectra 
in the time bins 68-110 s and 110-180 s (Figs. 2, 3) are described by 
a power law with pivot energies of 200 MeV and 500 MeV, photon 
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indices ,,(68-110) = -2.02 + 0.95 and /,,,(110-180) = -1.69 + 0.42, and 
normalization factors of No ¢g-110 = (2.02 £1.31) x 10’ MeV ‘cms ‘and 
No,10-180 = (4.48 + 2.10) x 10° § MeV‘ cms +, respectively. In each time 
interval, the analysis was limited to the energy range in which pho- 
tons were detected. The LAT light curve integrated in the energy range 
0.1-1 GeV is shown in Fig. 1. 


MAGIC 

To analyse the data we used the standard MAGIC software“ and fol- 
lowed the steps optimized for data taking under moderate moon illu- 
mination®. The spectral fitting was performed by a forward-folding 
method, assuming a simple power law for the intrinsic spectrum and 
taking into account the EBL effect, using the model of Dominguez 
et al.*°. Extended Data Table 1 shows the fitting results for various time 
bins (the pivot energy is chosen to minimize the correlation between the 
normalization and photon index parameters). The data points shown 
in Figs. 2,3 were obtained from the observed excess rates in estimated 
energy, the fluxes of which were evaluated in true energy (photon cor- 
rected energy by Monte Carlo simulation, after reconstruction and 
unfolding) using the effective time and a spill-over-corrected effective 
area obtained from the best fit. 

The time-resolved analysis hints to a possible spectral evolution to 
softer values, although we cannot exclude that the photon indices are 
compatible with a constant value of about -2.5 up to 2,400s. The signal 
and background inthe considered time bins are both in the low-count 
Poisson regime. Therefore, the correct treatment of the MAGIC data 
provided here includes the use of Poisson statistics, as well as systematic 
errors. To estimate the main source of systematic errors—our imper- 
fect knowledge of the absolute instrument calibration and the total 
atmospheric transmission—we vary the light scale in our Monte Carlo 
simulation, as suggested in previous studies. The result is reported in 
the last two lines of Extended Data Table 1 and in Extended Data Fig. 2. 

The systematic effects deriving from the choice of one particular 
EBL model were also studied. The analysis performed to obtain the 
time-integrated spectrum was repeated, employing three other mod- 
els*” ”. The contribution to the systematic error on the photon index 
caused by the uncertainty on the EBL model isa, = “019, whichis smaller 
than the statistical error only (one standard deviation), as already seen 
ina previous work”. On the other hand, the contribution of the choice 
of the EBL model to the systematic error on the normalization factor 
is only partially at the same level of the statistical error (one standard 
deviation), oy = *032 x10 ® The chosen EBL model returns a normaliza- 
tion factor that is lower than two of the other models and very close to 
the third one. 

The MAGIC energy-flux light curve that is presented in Fig. 1 was 
obtained by integrating the best-fit spectral model of each time bin 
from 0.3 tol TeV, inthe same manner as ina previous study”. The value 
of the fitted time constant reported here differs less than two standard 
deviations from the one previously reported”. The difference is due 
to the poor constraints on the spectral-fit parameters of the last time 
bin, which influences the light-curve fit. 


X-ray afterglow observations 

Swift/XRT. Swift-XRT (X-Ray Telescope) started observing 68 s after 
To. The source light curve’ was taken from the Swift-XRT light-curve 
repository” and was converted into 1-10-keV flux (Fig. 1) through dedi- 
cated spectral fits. The combined XRT + BAT spectral fit in Figs. 2, 3 is 
described above. 


XMM-Newton and NuSTAR. The XMM-Newton X-ray observatory 
and the Nuclear Spectroscopic Telescope Array (NUSTAR) started 
observing GRB 190114C under Director's Discretionary Time (DDT) 
Target of Opportunities 7.5 h and 22.5h, respectively, after the burst. 
The XMM-Newton and NuSTAR absorption-corrected fluxes (Fig. 1) 
were derived by fitting the spectrum with XSPEC and with the same 


power-law model, considering absorption in our Galaxy and at the 
redshift of the burst. 


NIR, optical and UV afterglow observations 
Light curves from the different instruments presented in this section 
are shown in Extended Data Fig. 3. 


GROND. The Gamma-Ray burst Optical/Near-infrared Detector 
(GROND)* started observations 3.8 h after the GRB trigger, and the 
follow-up continued until 29 January 29 2019. Image reduction and 
photometry were carried out with standard IRAF tasks, as described 
in refs. **°>, JHK, photometry was converted to AB magnitudes to pro- 
vide acommon flux system. The final photometry is given in Extended 
Data Table 2. 


BOOTES and GTC. The CASANDRA-1 ultra-wide-field camera® at the 
BOOTES-1 station in ESAt/INTA-CEDEA (Huelva, Spain) took an image of 
the GRB190114C location, starting at 20:57:18 UT (30 s exposure time) (Ex- 
tended Data Fig. 4). The Gran Canarias Telescope (GTC), equipped with 
the OSIRIS spectrograph”, started observations 2.6 h post-burst. The 
grisms R1OOOB and R2500I were used, covering the wavelength range 
3,700-10,000 A (600 sexposure time for eachgrism). The GTC detected 
ahighly extinguished continuum, as well as Ca 11 Hand K lines in absorp- 
tion and [O 11], Hgand [0 111] in emission (see Extended Data Fig. 5), all 
roughly at thesame redshift of z= 0.4245 + 0.0005 (ref.°8). By comparing 
the derived rest-frame equivalent widths with ref.°’, GRB190114C clearly 
shows higher than average, but not unprecedented, values. 


HST. The Hubble Space Telescope (HST) imaged the afterglow and host 
galaxy of GRB190114C on 11 February and 12 March 2019. HST observa- 
tions clearly reveal that the host galaxy is spiral (Extended Data Fig. 4). 
A direct subtraction of the epochs of observations with the F850LP 
filter yields a faint residual close to the nucleus of the host (Extended 
Data Fig. 4). From the position of the residual we estimate that the burst 
originated within 250 pc of the host galaxy nucleus. 


LT. The robotic 2-m Liverpool Telescope (LT)® slewed to the afterglow 
location at coordinated universal time (UTC) 2019-01-14 23:22:34 and 
onthe second night from UTC 2019-01-15 19:32:10 and acquired images 
inthe B, g, V,r,iand z bands (45 s exposure each on the first night and 
60s onthe second; see Extended Data Table 3). Aperture photometry 
of the afterglow was performed using acustom IDL script with a fixed 
aperture radius of 1.5”. Photometric calibration was performed relative 
to stars from the Pan-STARRSI catalogue”. 


NTT. The European Southern Observatory’s (ESO) New Technology 
Telescope (NTT) observed the optical counterpart of GRB 190114C 
under the extended Public ESO Spectroscopic Survey for Transient 
Objects (ePESSTO) using the NTT/EFOSC2 instrument in imaging 
mode”. Observations started at 04:36:53 UT on 16 January 2019 with 
g,r,iandz Gunn filters. Image reduction was carried out by following 
the standard procedures®. 


OASDG. The 0.5-m remote telescope of the Osservatorio Astronomico 
‘S. Di Giacomo’ (OASDG), located in Agerola (Italy), started observa- 
tions inthe optical RC band 0.54 hafter the burst. The afterglow of GRB 
190114C was clearly detected in all the images. 


NOT. The Nordic Optical Telescope (NOT) observed the optical after- 
glow of GRB190114C with the Alhambra Faint Object Spectrograph and 
Camera (AIFOSC) instrument. Imaging was obtained inthe griz filters 
with 300-s exposures, starting at 14 January 2019 21:20:56 UT, 24 min 
after the BAT trigger. The normalized spectrum (Extended Data Fig. 5) 
reveals strong host interstellar absorption lines of Ca H and K and of 
NaID, which provided a redshift of z= 0.425. 


REM. The 60-cm robotic Rapid Eye Mount telescope (REM) performed 
optical and NIR observations with the ROS2 optical imager and the 
REMIR NIR camera“. Observations were performed starting about 3.8h 
after the burst inthe rand J bands and lasted about one hour. 


Swift/UVOT. The Swift UltraViolet and Optical Telescope (UVOT)® 
began observations at T, + 54s inthe UVOT v-band. The first observa- 
tion after settling was in the UVOT white band“, started 74 s after the 
trigger and lasted for 150 s. A 50-s exposure with the UV grism was 
taken next, followed by multiple exposures rotating through all seven 
broad- and intermediate-band filters, until switching to only the UVOT 
clear white filter on 20 January 2019. Standard photometric calibra- 
tion and methods were used to derive the aperture photometry”. 
The grism zeroth-order data were reduced manually” to derive the 
B-magnitude and error. 


VLT. The STARGATE collaboration used the Very Large Telescope (VLT) 
and observed GRB190114C using the X-shooter spectrograph. Detailed 
analysis will be presented in forthcoming papers. A portion of the sec- 
ond spectrum is shown in Extended Data Fig. 5, illustrating the strong 
emission lines that are characteristic of a strongly star-forming galaxy, 
whose light is largely dominating over the afterglow at this epoch. 


Magnitudes of the underlying galaxies 

The HST images show a spiral or tidally disrupted galaxy whose bulge 
is coincident with the coordinates of GRB 190114C. A second galaxy is 
detected at an angular distance of 1.3” towards the northeast. The SED 
analysis was performed with LePhare”” using an iterative method that 
combined both the resolved photometry of the two galaxies found 
in the HST and VLT/HAWK-I data and the blended photometry from 
GALEX and WISE, in which the spatial resolution was much lower. Fur- 
ther details will be given ina separate paper (A.d.U.P. et al., manuscript 
in preparation). The estimated photometry for each object and their 
combination is given in Extended Data Table 4. 


Optical extinction 

The optical extinction towards the line of sight of a GRB is derived 
assuming a power lawas the intrinsic spectral shape”. Once the Galac- 
tic extinction (£,-y = 0.01; ref. ”’) is taken into account and the fairly 
bright host galaxy contribution is properly subtracted, a good fit to 
the data is obtained with the Large Magellanic Cloud recipe and 
Ay = 1.83 + 0.15. The spectral index B (F, « vo) evolves from hard to 
soft across the temporal break in the optical light curve at about 0.5d, 
moving from £,,=—0.10 + 0.12 to B,,=—-0.48 + 0.15. 


Radio and submillimetre afterglow observations 
The light curves obtained by the different instruments are shown in 
Extended Data Fig. 3. 


ALMA. Observations with the Atacama Large Millimetre-Submillimetre 
Array (ALMA) are reported in Band 3 (central observed frequency of 
97.500 GHz) and Band 6 (235.0487 GHz) between 15 January and 19 Janu- 
ary 2019. The data were calibrated within CASA (Common Astronomy 
Software Applications; version 5.4.0)” using the pipeline calibration. 
Photometric measurements were also performed within CASA. Early 
ALMA observations at 97.5 GHzare taken from ref. '°. 


ATCA. The Australia Telescope Compact Array (ATCA) observations were 
made withthe ATCA4-cm receivers (band centres of 5.5 and 9 GHz), 15-mm 
receivers (band centres of 17 and19 GHz) and 7-mm receivers (band centres 
of43.and 45 GHz). The ATCA data (see Extended Data Table 5) were obtained 
using the CABB continuum mode” and were reduced with the software 
packages Miriad” and CASA“ using standard techniques. The quoted errors 
are lo, whichincludethe root-mean-square (r.m.s.) and Gaussian loerrors. 
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GMRT. The upgraded Giant Metre-wave Radio Telescope” (UGMRT) 
observed on 17 January 2019 13.44 UT (2.8 d after the burst) in band 5 
(1,000-1,450 MHz) with 2,048 channels spread over 400 MHz. The 
GMRT detected a weak source with a flux density of 73 + 17 py at the 
GRB position”. The flux should be considered as an upper limit, as the 
contribution from the host” has not been subtracted. 


MeerKAT. The new MeerKAT radio observatory®°*! observed on15 and 
18 January 2019, with DDT requested by the ThunderKAT Large Survey 
Project®. Both epoch measurements used 63 antennas and were carried 
out in the L-band, spanning 856 MHz and centred at 1,284 MHz. The 
MeerKAT flux estimation was done by finding and fitting the source with 
the software PyBDSF v.1.8.15°°. Adding the r.m.s. noise in quadrature 
to the flux uncertainty leads to final flux measurements of 125 +14 Wy 
per beam on 15 January and 97 + 16 wy per beam on 18 January. The 
contribution from the host galaxy” has not been subtracted. Therefore, 
these measurements provide a maximum flux of the GRB. 


JCMT SCUBA-2. Sub-millimetre observations (Extended Data Table 5) 
were performed simultaneously at 850 um and 450 pm on three nights 
using the Submillimetre Common-User Bolometer Array 2 (SCUBA-2) 
continuum camera™ on the James Clerk Maxwell Telescope (JCMT). 
GRB190114C was not detected on any of the individual measurements. 
By combining all the SCUBA-2 continuum camera® observations, the 
r.m.s. background noise is 0.95 mJy per beam at 850 pm and 5.4 mJy per 
beam at 450 pm at 1.67 d after the burst trigger. 


Prompt-emission model for the early-time MAGIC emission 
Inthe standard picture the prompt sub-megaelectronvolt spectrum 
is explained as synchrotron radiation from relativistic accelerated 
electrons in the energy-dissipation region. The associated inverse 
Compton componentis sensitive to the details of the dynamics: for 
example, in the internal shock model if the peak energy is initially 
very high and the inverse Compton component is suppressed owing 
to Klein-Nishina effects, the peak of the inverse Compton compo- 
nent may be delayed and become bright only at late times, when 
scattering occurs in the Thomson regime. Simulations showed that 
the magnetic fields required to produce the gigaelectronvolt-terae- 
lectronvolt component are rather low®, with ¢, ~ 10°. In this frame- 
work the contribution of the inverse Compton component to the 
observed flux at early times (62-90 s; see Extended Data Table 1) does 
not exceed ~20%. Alternatively, if the prompt emission originates 
in reprocessed photospheric emission, the early teraelectronvolt 
flux may arise from inverse Compton scattering of thermal photons 
by freshly heated electrons below the photosphere at low optical 
depths. Another possibility for the generation of teraelectronvolt 
photons might be the inverse Compton scattering of prompt meg- 
aelectronvolt photons by electrons in the external forward-shock 
region, where electrons are heated to an average Lorentz factor of 
order 10‘ at early times. 


Afterglow model 

Synchrotron and SSC radiation from electrons accelerated at the for- 
ward shock were modelled within the external-shock scenario’”*”°*>*°, 
The results of the modelling are overlaid with the data in Fig. 3 and 
Extended Data Figs. 6, 7. 

We consider two types of power-law radial profiles n(R) =noR * for the 
external environment: s = 0 (homogeneous medium) and s= 2 (wind- 
like medium, typical of an environment shaped by the stellar wind of 
the progenitor). In the latter case, we define ny =3 x 10*A: cm, where 
Ax is a parameter characterizing the normalization of the density. 
We assume that electrons swept up by the shock are accelerated into 
a power-law distribution described by the spectral index p, where 
dN/dy « y”, where y is the electron Lorentz factor. We call v,, the 


characteristic synchrotron frequency of electrons with Lorentz factor 
Ym Vis the cooling frequency and v,, the synchrotron self-absorption 
frequency. 

The early-time optical emission (up to ~1,000s) and radio emission 
(up to ~10° s) are probably dominated by reverse-shock radiation’. The 
detailed modelling of this component is not discussed here, where we 
focus on forward-shock radiation. 

The XRT flux (Fig. 1, blue data points) decays as Fy « ¢** with 
a, =-1.36 + 0.02. If vy > max(v,,, V,), the X-ray light curve is predicted 
to decay ast? ?’*, which implies p = 2.5. Another possibility is to assume 
Vm < Vx < V,, Which implies p = 2.1-2.2 for s=2 and p=2.8 fors=0.A 
broken power law provides a better fit (5.3 x 10° probability of chance 
improvement), witha break occurring around 4 x 10‘s and decay indi- 
ces of ay, = -1.32 + 0.03 and a, , = —-1.55 + 0.04. This behaviour can be 
explained by the passage of v, in the XRT band and assuming again 
p=2.4-2.5 for s=2 and p=2.8 fors=0. 

The optical light curve starts displaying a shallow decay with time 
(with temporal index poorly constrained, between —0.5 and —0.25) 
starting from ~2 x 10’s, followed by asteepening around 8 x10‘ s, when 
the temporal decay becomes similar to the decay in the X-ray band, 
which suggests that after this time the X-ray and optical bands lie in 
the same part of the synchrotron spectrum. If the break is interpreted 
as the synchrotron characteristic frequency v,, crossing the optical 
band, after the break the observed temporal decay requires a steep 
value of p = 3 for s= 0 and a value between p = 2.4 and p= 2.5 for s =2. 
Independently of the density profile of the external medium and of 
the cooling regime of the electrons, v,, < ¢, which implies that v,, is 
inthe soft-X-ray band at 10’s. The SED at ~100 sis indeed characterized 
by a peak between 5-30 keV (Fig. 3). Information on the location of 
the self-absorption frequency is provided by observations at 1 GHz, 
showing that v,, ~ 1 GHz at 10° s (Extended Data Fig. 6). 

To summarize, in a wind-like scenario, X-ray and optical emission 
and their evolution in time can be explained if p = 2.4-2.5 and the 
emission is initially in the fast-cooling regime transitions to a slow- 
cooling regime around 3 x 10°s. The optical spectral index at late times 
is predicted to be (1- p)/2 = -0.72, in agreement with observations. v,, 
crosses the optical band at t~ 8 x 10‘ s, explaining the steepening of 
the optical light curve and the flattening of the optical spectrum. The 
X-ray band initially lies above (or close to) v,,, and the break frequency 
v, Starts crossing the X-ray band around (2-4) x 10‘ s, producing the 
steepening in the decay rate (the cooling frequency increases with 
time for s= 2). Inthis case, before the temporal break, the decay rate 
is related to the spectral index of the electron energy distribution by 
Ay, = (2 - 3p)/4 = -1.3 for p = 2.4-2.5. Well after the break, this value 
of p predicts a decay rate of a, , = (1- 3p)/4 between a, = —1.55 and 
ay, =—1.62. Overall, this interpretation is also consistent with the fact 
that the late-time (t>10°s) X-ray and optical light curves display similar 
temporal decays (Fig. 1), as they lie in the same part of the synchro- 
tron spectrum (Vin < Vopr < Vx < V,). A similar picture can be invoked to 
explain the emission when assuming a homogeneous density medium, 
but a steeper value of pis required. In this case, however, no break is 
predicted in the X-ray light curve. 

We now add to the picture the information brought by the terae- 
lectronvolt detection. The model is built with reference to the MAGIC 
flux and spectral indices derived considering statistical errors only 
(see Extended Data Table 1 and green data points in Extended Data 
Fig. 2). The light curve decays in time as ¢**' and the photon index is 
consistent within ~lo with /,, zy ~ -2.5 for the entire duration of the 
emission, although there is evidence for an evolution from stronger 
(about —2) to weaker (about —2.8) values. In the first broadband SED 
(Fig. 3, 68-110 s), LAT observations provide strong evidence for the 
presence of two separated spectral peaks. 

Assuming Thomson scattering, the SSC peak is given by: 


SSC _ 2,,syn 
Vpeak ™ 2Y eV peak (1) 


whereas in the Klein—Nishina regime, the SSC peak should be located at: 


2y,-m,.c? 
1+z 


hvsSS= Q) 


where y, = min(y,, Ym). The synchrotron spectral peak is located at 
Eek 10 keV and the peak of the SSC component must be 
EpeskS100 GeV to explain the MAGIC photon index. Both the Klein— 
Nishinaand Thomson scattering regimes imply that y.< 10°. This small 
value presents two problems: (i) if the bulk Lorentz factor Fis larger 
than 150 (whichis a necessary condition to avoid strong y-y opacity; 
see below), asmall y,, translates into a small efficiency of the electron 
acceleration, with €, < 0.05; (ii) the synchrotron peak energy can be 
located at E°",~10 keV only for B210°G. A large Banda small e, 
would make it difficult to explain the presence of a strong SSC emission. 
These calculations show that y-y opacity probably plays a role in shap- 
ing and softening the observed SSC spectra®*”. 


For a y-ray photon with energy E£,, the t,, opacity is: 


Tyy(Ey) = 0 (R/P)n, (Ey) (3) 


where n,=L,/(411RcIE,) is the density of target photons in the comov- 
ing frame, L,is the luminosity and £,=(m,c’)/’/[E,(1+ z)’] is the energy 
of target photons in the observer frame (c, speed of light in vacuum). 
Target photons for photons of energy £, = 0.2-1 TeV and for = 120-150 
have energies in the range 4-30 keV. When y-y absorption is relevant, 
the emission from pairs can give a non-negligible contribution to the 
radiative output. 

To properly model all the physical processes that shape the broad- 
band radiation, we use a numerical code that solves the evolution of 
the electron distributions and derives the radiative output, taking 
into account the following processes: synchrotron and SSC losses, 
adiabatic losses, y-y absorption, emission from pairs and synchro- 
tron self-absorption®®. We find that for the parameters assumed in 
the proposed model (see below), the contribution from pairs to the 
emission is negligible. 

The MAGIC photon index (Extended Data Table 1) and its evolution 
with time constrain the SSC peak energy to <1 TeV at the beginning of 
the observations (Extended Data Table 1). In general, the internal opac- 
ity decreases with time and Klein-Nishina effects become less relevant. 
A possible softening of the spectrum with time, as the one suggested 
by the observations, requires that the spectral peak decreases with 
time and moves below the MAGIC energy range. In the slow-cooling 
regime, the SSC peak evolves to higher frequencies for a wind-like 
medium and decreases very slowly (vpeq, « ¢"/*) fora constant-density 
medium (bothin the Klein-Nishina and Thomson regimes). In the fast- 
cooling regime the evolution is faster (vp86, « ¢/?— ¢-°/*, depending 
onthe medium and regime). 

We model the multi-band observations considering both s= 0 and 
s=2. The results are shown in Fig. 3, Extended Data Figs. 6, 7, where 
model curves are overlaid with observations. The model curves shown 
in these figures are derived using the following parameters. For the 
model in Fig. 3 and in Extended Data Figs. 7 (solid and dashed curves): 
s=0,€,=0.07, €,=8 10°, p=2.6, ny =0.5 and F, =8 x 10” erg. For the 
dotted curves in Extended Data Fig. 7 and the SEDs in Extended Data 
Fig. 6:5=2, €,.=0.6,€,=10%, p=2.4,A.=0.land £, =4 x10” erg. 

Using the constraints on the afterglow onset time (rn =5-10 s, 
from the smooth component detected during the prompt emission) 
the initial bulk Lorentz factor is constrained to values /, = 300 and 
T)=700 for s=2 and s=0, respectively. 

Consistently with the qualitative description above, we find that 
late-time optical observations can indeed be explained with v,, crossing 
the band (see the SED modelling in Extended Data Fig. 6 and the dotted 
curves in Extended Data Fig. 7). However, a large v,, is required in this 
case and consequently the peak of the SSC component would also be 


large and lie above the MAGIC energy range. The resulting MAGIC light 
curve (green dotted curve in Extended Data Fig. 7) does not agree with 
observations. By relaxing the requirement on v,,, the teraelectronvolt 
spectra (Fig. 3) and light curve (green solid curve in Extended Data 
Fig. 7) can be explained. As noted, a wind-like medium can explain the 
steepening of the X-ray light curve at 8 x 10* s, whereas no steepening 
is expected ina homogeneous medium (blue dotted and solid lines in 
Extended Data Fig. 7). We find that the gigaelectronvolt flux detected 
by LAT ata late time (t= 10* s) is dominated by the SSC component 
(dashed line in Extended Data Fig. 7). 


Data availability 
Data are available from the corresponding authors upon request. 


Code availability 


Proprietary data reconstruction codes were generated at the MAGIC 
telescope large-scale facility. Information supporting the findings of 
this study is available from the corresponding authors upon request. 
Source data for Figs. 2, 3 are provided with the paper. 


32. Hamburg, R. GRB 190114C: Fermi GBM detection. GCN Circulars 23707 https://gcn.gsfc. 
nasa.gov/gcon3/23707.gcn3 (2019). 

33. Kocevski, D. et al. GRB 190114C: Fermi-LAT detection. GCN Circulars 23709 https://gcn. 
gsfc.nasa.gov/gcn3/23709.gcen3 (2019). 

34. Gropp, J.D. GRB 190114C: Swift detection of a very bright burst with a bright optical 
counterpart. GCN Circulars 23688 https://gcn.gsfc.nasa.gov/gcn3/23688.gcn3 (2019). 

35. Ursi, A. et al. GRB 190114C: AGILE/MCAL detection. GCN Circulars 23712 https://gcn.gsfc. 
nasa.gov/gcon3/23712.gcn3 (2019). 

36. Frederiks, D. et al. Konus-Wind observation of GRB 190114C. GCN Circulars 23737 https:// 
gcn.gsfc.nasa.gov/gen3/23737.gcn3 (2019). 

37. Minaev, P. & Pozanenko, A. GRB 190114C: SPI-ACS/INTEGRAL extended emission 
detection. GCN Circulars 23714 https://gcn.gsfc.nasa.gov/gcn3/23714.gcen3 (2019). 

38. Xiao, S. et al. GRB 190114C: Insight-HXMT/HE detection. GCN Circulars 23716 https://gen. 
gsfc.nasa.gov/gcon3/23716.gcn3 (2019). 

39. Tavani, M. et al. The AGILE mission. Astron. Astrophys. 502, 995-1013 (2009). 

40. Goldstein, A. et al. The Fermi GBM gamma-ray burst spectral catalog: the first two years. 
Astrophys. J. Suppl. Ser. 199, 19 (2012). 

41. Meegan, C. et al. The Fermi Gamma-ray Burst Monitor. Astrophys. J. 702, 791-804 
(2009). 

42. Barthelmy, S. D. et al. The Burst Alert Telescope (BAT) on the SWIFT Midex Mission. Space 
Sci. Rev. 120, 143-164 (2005). 

43. Atwood, A. A. et al. The Large Area Telescope on the Fermi gamma-ray space telescope 
mission. Astrophys. J. 697, 1071-1102 (2009). 

44. Aleksic¢, J. et al. The major upgrade of the MAGIC telescopes, part Il: a performance study 
using observations of the Crab Nebula. Astropart. Phys. 72, 76-94 (2016). 

45. Ahnen, M. L. et al. Performance of the MAGIC telescopes under moonlight. Astropart. 
Phys. 94, 29-41 (2017). 

46. Dominguez, A. et al. Extragalactic background light inferred from AEGIS galaxy-SED-type 
fractions. Mon. Not. R. Astron. Soc. 410, 2556-2578 (2011). 

47. Franceschini, A., Rodighiero, G. & Vaccari, M. Extragalactic optical-infrared background 
radiation, its time evolution and the cosmic photon-photon opacity. Astron. Astrophys. 
487, 837-852 (2008). 

48. Finke, J. D., Razzaque, S. & Dermer, C. D. Modeling the extragalactic background light 
from stars and dust. Astrophys. J. 712, 238-249 (2010). 

49. Gilmore, R. C., Somerville, R. S., Primack, J. R. & Dominguez, A. Semi-analytic modelling 
of the extragalactic background light and consequences for extragalactic gamma-ray 
spectra. Mon. Not. R. Astron. Soc. 422, 3189-3207 (2012). 

50. UK Swift Science Data Centre. GRB 190114C Swift-XRT light curve https://www.swift. 
ac.uk/xrt_curves/00883832/. 

51. Evans, P. A. et al. Methods and results of an automatic analysis of a complete sample of 
Swift-XRT observations of GRBs. Mon. Not. R. Astron. Soc. 397, 1177-1201 (2009). 

52. Greiner, J. et al. GROND—a 7-channel imager. Publ. Astron. Soc. Pacif. 120, 405-424 
(2008). 

53. Tody, D. in Astronomical Data Analysis Software and Systems II, ASP Conference Series 
Vol. 52 (eds Hanisch, R. J. et al.) 173-183 (1993). 

54. Krihler, T. et al. The 2175 A dust feature in a gamma-ray burst afterglow at redshift 2.45. 
Astrophys. J. 685, 376-383 (2008). 

55. Bolmer, J. et al. Dust reddening and extinction curves toward gamma-ray bursts at z > 4. 
Astron. Astrophys. 609, A62 (2018). 

56. Castro-Tirado, A. J. et al. A very sensitive all-sky CCD camera for continuous recording of 
the night sky. In Proc. SPIE, Advanced Software and Control for Astronomy II Vol. 7019 
(SPIE, 2008). 

57. Cepa, J. et al. OSIRIS tunable imager and spectrograph. In In Proc. SPIE Optical and IR 
Telescope Instrumentation and Detectors Vol. 4008 (eds lye, M. & Moorwood, A. F.) 623- 
631 (SPIE, 2000). 

58. Castro-Tirado, A. GRB 190114C: refined redshift by the 10.4m GTC. GCN Circulars 23708 
https://gcn.gsfc.nasa.gov/gcn3/23708.gcen3 (2019). 


Article 


59. deUgarte Postigo, A. et al. The distribution of equivalent widths in long GRB afterglow 
spectra. Astron. Astrophys. 548, A11 (2012). 

60. Steele, |. A. et al. The Liverpool Telescope: performance and first results. In Proc. SPIE 
Ground-based Telescopes Vol. 5489 (ed. Oschmann, J. M. Jr) 679-692 (SPIE, 2004). 

61. Chambers, K. C. et al. The Pan-STARRS1 surveys. Preprint at https://arxiv.org/ 
abs/1612.05560 (2016). 

62. Tarenghi, M. & Wilson, R. N. The ESO NTT (New Technology Telescope): the first active 
optics telescope. In Proc. SPIE Active Telescope Systems Vol. 1114 (ed. Roddier, F. J.) 302- 
313 (SPIE, 1989). 

63. Smartt, S. J. et al. PESSTO: survey description and products from the first data release by 
the Public ESO Spectroscopic Survey of Transient Objects. Astron. Astrophys. 579, A40 
(2015). 

64. Covino, S. et al. REM: a fully robotic telescope for GRB observations. In Proc. SPIE 
Ground-based Instrumentation for Astronomy Vol. 5492 (eds Moorwood, A. F. M. & lye, M.) 
1613-1622 (SPIE, 2004). 

65. Roming, P. W. A. et al. The Swift ultra-violet/optical telescope. Space Sci. Rev. 120, 95-142 
(2005). 

66. Siegel, M.H. & Gropp, J. D. GRB 190114C: Swift/UVOT detection. GCN Circulars 23725 
https://gon.gsfc.nasa.gov/gen3/23725.gcn3 (2019). 

67. Poole, T. S. et al. Photometric calibration of the Swift ultraviolet/optical telescope. Mon. 
Not. R. Astron. Soc. 383, 627-645 (2008). 

68. Breeveld, A. A. et al. An updated ultraviolet calibration for the Swift/UVOT. In American 
Institute of Physics Conference Series Vol. 1358, 373-376 (AIP, 2011). 

69. Kuin, N. P. M. et al. Calibration of the Swift-UVOT ultraviolet and visible grisms. Mon. Not. 
R. Astron. Soc. 449, 2514-2538 (2015). 

70. Arnouts, S. et al. Measuring and modelling the redshift evolution of clustering: the 
Hubble Deep Field North. Mon. Not. R. Astron. Soc. 310, 540-556 (1999). 

71. Ilbert, O. et al. Accurate photometric redshifts for the CFHT legacy survey calibrated 
using the VIMOS VLT deep survey. Astron. Astrophys. 457, 841-856 (2006). 

72. Covino, S. et al. Dust extinctions for an unbiased sample of gamma-ray burst afterglows. 
Mon. Not. R. Astron. Soc. 432, 1231-1244 (2013). 

73. Schlafly, E. F. & Finkbeiner, D. P. Measuring reddening with Sloan Digital Sky Survey stellar 
spectra and recalibrating SFD. Astrophys. J. 737, 103 (2011). 

74. McMullin, J. P., Waters, B., Schiebel, D., Young, W. & Golap, K. CASA architecture and 
applications. In Astronomical Data Analysis Software and Systems XVI, Vol. 376 (eds Shaw, 
R.A. et al.) 127 (ASP, 2007). 

75. Wilson, W. E. et al. The Australia Telescope Compact Array broad-band backend: 
description and first results. Mon. Not. R. Astron. Soc. 416, 832-856 (2011). 

76. Sault, R. J., Teuben, P. J. & Wright, M. C. H. A retrospective view of MIRIAD. In Astronomical 
Data Analysis Software and Systems IV Vol. 77 (eds Shaw, R. A. et al.) 433 (ASP, 1995). 

77. Swarup, G. et al. The Giant Metre-wave Radio Telescope. Current Science 60, 95-105 
(1991). 

78. Cherukuri, S. V. et al. GRB 190114C: GMRT detection at 1.26GHz. GCN Circulars 23762 
https://gcon.gsfc.nasa.gov/gcn3/23762.gcn3 (2019). 

79. Tremou, L. et al. GRB 190114C: MeerKAT radio observation. GCN Circulars 23760 https:// 
gcn.gsfc.nasa.gov/gcn3/23760.gen3 (2019). 

80. Camilo, F. et al. Revival of the magnetar PSR J1622-4950: observations with MeerKAT, 
Parkes, XMM-Newton, Swift, Chandra, and NuSTAR. Astrophys. J. 856, 180 (2018). 

81. Jonas, J. L. & The MeerKAT Team. The MeerKAT Radio Telescope. In Proc. of MeerKAT 
Science: On the Pathway to the SKA 001 (2016). 

82. Fender, R. et al. ThunderKAT: the MeerKAT large survey project for image-plane radio 
transients. Preprint at https://arxiv.org/abs/1711.04132 (2017). 

83. Mohan, N. & Rafferty, D. PyBDSF: Python Blob Detection and Source Finder https://www. 
astron.nl/citt/pybdsf/ (2015) 

84. Holland, W. S. et al. SCUBA-2: the 10 000 pixel bolometer camera on the James Clerk 
Maxwell Telescope. Mon. Not. R. Astron. Soc. 430, 2513-2533 (2013). 

85. Boénjak, Z., Daigne, F. & Dubus, G. Prompt high-energy emission from gamma-ray bursts 
in the internal shock model. Astron. Astrophys. 498, 677-703 (2009). 

86. Panaitescu, A. & Kumar, P. Analytic light curves of gamma-ray burst afterglows: 
homogeneous versus wind external media. Astrophys. J. 543, 66-76 (2000). 

87. Derishev, E. & Piran, T. The physical conditions of the afterglow implied by MAGIC’s sub- 
TeV observations of GRB 190114C. Astrophys. J. Lett. 880, 27 (2019). 

88. Mastichiadis, A. & Kirk, J. G. Self-consistent particle acceleration in active galactic nuclei. 
Astron. Astrophys. 295, 613 (1995). 

89. Vurm, I. & Poutanen, J. Time-dependent modeling of radiative processes in hot 
magnetized plasmas. Astrophys. J. 698, 293-316 (2009). 

90. Petropoulou, M. & Mastichiadis, A. On the multiwavelength emission from gamma ray 
burst afterglows. Astron. Astrophys. 507, 599-610 (2009). 

91. Pennanen, T., Vurm, I. & Poutanen, J. Simulations of gamma-ray burst afterglows with a 
relativistic kinetic code. Astron. Astrophys. 564, A77 (2014). 


Acknowledgements We thank the Instituto de Astrofisica de Canarias for the excellent 
working conditions at the Observatorio del Roque de los Muchachos in La Palma. We 
acknowledge financial support by the German BMBF and MPG, the Italian INFN and INAF, the 
Swiss National Fund SNF, the ERDF under the Spanish MINECO (FPA2017-87859-P, FPA2017- 
85668-P, FPA2017-82729-C6-2-R, FPA2017-82729-C6-6-R, FPA2017-82729-C6-5-R, AYA2015- 
71042-P, AYA2016-76012-C3-1-P, ESP2017-87055-C2-2-P, FPA201790566REDC), the Indian 
Department of Atomic Energy, the Japanese JSPS and MEXT, the Bulgarian Ministry of 
Education and Science, National RI Roadmap Project DO1-153/28.08.2018 and the Academy of 
Finland grant number 320045. This work was also supported by the Spanish Centro de 
Excelencia ‘Severo Ochoa’ through grants SEV-2016-0588 and SEV-2015-0548 and Unidad de 
Excelencia ‘Maria de Maeztu’ MDM-2014-0369, by the Croatian Science Foundation (HrZZ) 
Project IP-2016-06-9782 and the University of Rijeka Project 13.12.1.3.02, by the DFG 
Collaborative Research Centers SFB823/C4 and SFB876/C3, the Polish National Research 
Centre grant UMO-2016/22/M/ST9/00382 and by the Brazilian MCTIC, CNPq and FAPERJ. 

L. Nava acknowledges funding from the European Union's Horizon 2020 Research and 


Innovation programme under the Marie Sktodowska-Curie grant agreement number 664931. 
E. Moretti acknowledges funding from the European Union's Horizon 2020 research and 
innovation programme under Marie Sktodowska-Curie grant agreement number 665919. This 
study used the following ALMA data: ADS/JAO.ALMA#2018.A.00020.T, ADS/JAO. 
ALMA#2018.1.01410.T. ALMA is a partnership of ESO (representing its member states), NSF 
(USA) and NINS (Japan), together with NRC (Canada), MOST and ASIAA (Taiwan), and KASI 
(Republic of Korea), in cooperation with the Republic of Chile. The Joint ALMA Observatory is 
operated by ESO, AUI/NRAO and NAOJ. C.CLT., A.d.U.P. and D.A.K. acknowledge support from 
the Spanish research project AYA2017-89384-P. C.C.T and A.d.U.P. acknowledge support from 
funding associated with Ramon y Cajal fellowships (RyC-2012-09984 and RyC-2012-09975). 
D.A.K. acknowledges support from funding associated with Juan de la Cierva Incorporacién 
fellowships (IJCI-2015-26153). The JCMT is operated by the East Asian Observatory on behalf of 
The National Astronomical Observatory of Japan, Academia Sinica Institute of Astronomy and 
Astrophysics, the Korea Astronomy and Space Science Institute, and Center for Astronomical 
Mega-Science (as well as the National Key R&D Program of China via grant number 
2017YFA0402700). Additional funding support is provided by the Science and Technology 
Facilities Council of the UK and participating universities in the UK and Canada. The JCMT data 
reported here were obtained under project M18BP040 (principal investigator D.A.P.). We thank 
M. Rawlings, K. Silva, S. Urquart and the JCMT staff for support for these observations. The 
Liverpool Telescope, located on the island of La Palma, in the Spanish Observatorio del Roque 
de los Muchachos of the Instituto de Astrofisica de Canarias, is operated by Liverpool John 
Moores University with financial support from the UK Science and Technology Facilities 
Council. The Australia Telescope Compact Array is part of the Australia Telescope National 
Facility, which is funded by the Australian Government for operation as a National Facility 
managed by CSIRO. G.E.A. is the recipient of an Australian Research Council Discovery Early 
Career Researcher Award (project number DE180100346) and J.C.A.M.-J. is the recipient of an 
Australian Research Council Future Fellowship (project number FT140101082) funded by the 
Australian Government. Support for the German contribution to GBM was provided by the 
Bundesministerium fur Bildung und Forschung (BMBF) via the Deutsches Zentrum fur Luft und 
Raumfahrt (DLR) under grant number 50 QV 0301. The University of Alabama in Huntsville 
(UAH) coauthors acknowledge NASA funding from cooperative agreement NNM11AAO1A. 
C.A.W.-H. and C.M.H. acknowledge NASA funding through the Fermi-GBM project. The Fermi 
LAT Collaboration acknowledges support from a number of agencies and institutes that have 
supported both the development and the operation of the LAT, as well as scientific data 
analysis. These include the National Aeronautics and Space Administration and the 
Department of Energy (DOE) in the USA; the Commissariat a l’Energie Atomique and the 
Centre National de la Recherche Scientifique/Institut National de Physique Nucléaire et de 
Physique des Particules in France; the Agenzia Spaziale Italiana and the Istituto Nazionale di 
Fisica Nucleare in Italy; the Ministry of Education, Culture, Sports, Science and Technology 
(MEXT), High Energy Accelerator Research Organization (KEK) and Japan Aerospace 
Exploration Agency (JAXA) in Japan; and the K. A. Wallenberg Foundation, the Swedish 
Research Council and the Swedish National Space Board in Sweden. We acknowledge 
additional support for science analysis during the operations phase from the Istituto Nazionale 
di Astrofisica in Italy and the Centre National d'Etudes Spatiales in France. This work was 
performed in part under DOE contract DE-ACO2-76SFO0515. Part of the funding for GROND 
(both hardware and personnel) was granted from the Leibniz-Prize to G. Hasinger (DFG grant 
HA 1850/28-1). Swift data were retrieved from the Swift archive at HEASARC/NASA-GSFC and 
from the UK Swift Science Data Centre. Support for Swift in the UK is provided by the UK Space 
Agency. This work is based on observations obtained with XMM-Newton, an ESA science 
mission with instruments and contributions directly funded by ESA Member States and NASA. 
This work is partially based on observations collected at the European Organisation for 
Astronomical Research in the Southern Hemisphere under ESO programme 199.D-0143. The 
work is partly based on observations made with the GTC, installed in the Spanish Observatorio 
del Roque de los Muchachos of the Instituto de Astrofisica de Canarias, in the island of La 
Palma. This work is partially based on observations made with the NOT (programme 58-502), 
operated by the Nordic Optical Telescope Scientific Association at the Observatorio del Roque 
de los Muchachos, La Palma, Spain, of the Instituto de Astrofisica de Canarias. This work is 
partially based on observations collected at the European Organisation for Astronomical 
Research in the Southern Hemisphere under ESO programme 102.D-0662. This work is 
partially based on observations collected through the ESO programme 199.D-0143 ePESSTO. 
M. Gromadzki is supported by the Polish NCN MAESTRO grant 2014/14/A/ST9/00121. M.N. is 
supported by a Royal Astronomical Society Research Fellowship M.G.B., S. Campana, 

A. Melandri and P.D‘A. acknowledge ASI grant I/004/11/3. S. Campana acknowledges support 
from agreement ASI-INAF number 2017-14-H.0. S.J.S. acknowledges funding from STFC grant 
ST/PO000312/1. N.P.M.K. acknowledges support by the UK Space Agency under grant ST/ 
P002323/1 and the UK Science and Technology Facilities Council under grant ST/NO0811/1. 
L.P. and S. Lotti acknowledge partial support from agreement ASI-INAF number 2017-14-H.0. 
A.FV. acknowledges RFBR 18-29-21030 for support. A.J.C.-T. acknowledges support from the 
Junta de Andalucia (Project PO7-TIC-03094) and from the Spanish Ministry Projects AYA2012- 
39727-CO3-01 and 2015-71718R. K. Misra acknowledges support from the Department of 
Science and Technology (DST), Government of India and the Indo-US Science and Technology 
Forum (IUSSTF) for the WISTEMM fellowship and Departnment of Physics, UC Davis, where a 
part of this work was carried out. S.B.P. and K. Misra acknowledge BRICS (Brazil, Russia, India, 
China and South Africa) grant DST/IMRCD/BRICS/Pilotcall/ProFCheap/2017(G) for this work. 
M.J.M. acknowledges the support of the National Science Centre, Poland, through grant 
2018/30/E/ST9/00208. V.J. and L.R. acknowledge support from grant EMR/2016/007127 from 
the Department of Science and Technology, India. K. Maguire acknowledges support from 
H2020 through an ERC starting grant (758638). L.|. acknowledges M. Della Valle for support in 
the operation of the telescope. 


Author contributions The MAGIC telescope system was designed and constructed by the 
MAGIC Collaboration. Operation, data processing, calibration, Monte Carlo simulations of the 
detector and of theoretical models, and data analyses were performed by the members of the 
MAGIC Collaboration, who also discussed and approved the scientific results. L. Nava 
coordinated the collection of the data, developed the theoretical interpretation and wrote the 
main section and the section on afterglow modelling. E. Moretti coordinated the analysis of 
the MAGIC data, wrote the relevant sections and, together with F. Longo, coordinated the 


collaboration with the Fermi team. D. Miceli, Y.S. and S.F. performed the analysis of the MAGIC. 
data. S. Covino provided support with the analysis of the optical data and the writing of the 
corresponding sections. Z.B. performed calculations for the contribution of prompt emission 
to the teraelectronvolt radiation and wrote the corresponding section. A. Stamerra, D.P. and 
S.I. contributed to structuring and editing the paper. A. Berti contributed to editing and 
finalizing the manuscript. R.M. coordinated and supervised the writing of the paper. All MAGIC 
collaborators contributed to the editing of and provided comments on the final version of the 
manuscript. S. Campana and M.G.B. extracted the spectra and performed the spectral analysis 
of the Swift-BAT and Swift-XRT data. N.P.M.K. derived the photometry for the Swift-UVOT event 
mode data and the UV grism exposure. M.H.S. derived the image-mode Swift-UVOT 
photometry. A.d.U.P. was principal investigator of ALMA programme 2018.1A.000201T, 
triggered these observations and performed photometry. S. Martin reduced the ALMA Band 6 
data. C.C.T., S. Schulze, D.A.K. and M. Michatowski participated in the ALMA DDT proposal 
preparation, observations and scientific analysis of the data. D.A.P. was principal investigator 
of ALMA programme 2018.1.01410.7 and triggered these observations and was principal 
investigator of the LT and JCMT programmes. A.M.C. analysed the ALMA Band 3 and LT data 
and wrote the LT text. S. Schulze contributed to the development of the ALMA Band 3 
observing programme. I.A.S. triggered the JCMT programme, analysed the data and wrote the 
associated text. N.R-T. contributed to the development of the JCMT programme. D.A.K. and 
C.CT. triggered and coordinated the X-shooter observations. D.A.K. independently checked 
the optical light curve analysis. K. Misra was the principal investigator of the GMRT programme 
35_018. S.C. and V.J. analysed the data. L.R. contributed to the observation plan and data 
analysis. E. Tremou, |.H. and R.D. performed the MeerKAT data analysis. G.E.A., A. Moin, 

S. Schulze and E. Troja were principal investigators of ATCA programme CX424. G.E.A., 

M. Wieringa and J. Stevens carried out the observations. G.E.A., G. Bernardi, S.K., M. Marongiu, 
A. Moin, R.R. and M. Wieringa analysed these data. J.C.A.M.-J. and L.P. participated in the ATCA 
proposal preparation and the scientific analysis of the data. The ePESSTO project was 


delivered by the following, who contributed to managing, executing, reducing, analysing ESO/ 
NTT data and provided comments to the manuscript: J.P.A., N.C.S., P.D’A., M. Gromadzki, C.., 
E.K., K. Maguire, M.N., F.R. and S.J.S.; A. Melandri and A. Rossi reduced and analysed REM data 
and provided comments to the manuscript. J. Bolmer was responsible for observing the GRB 
with GROND and for the data reduction and calibration. J. Bolmer and J. Greiner contributed to 
the analysis of the data and writing of the text. E. Troja triggered the NUSTAR TOO observations 
performed under the DDT programme, L.P. requested the XMM-Newton data, obtained under a 
DDT programme, and carried out the scientific analysis of the XMM-Newton and NuSTAR data. 
S. Lotti analysed the NuSTAR data and wrote the associated text. A. Tiengo and G. Novara 
analysed the XMM-Newton data and wrote the associated text. A.J.C.-T. led the observing 
BOOTES and GTC programmes. A. Castellon, C.J.P.d.P., E.F.-G., |.M.C., S.B.P. and XY.L. analysed 
the BOOTES data, and A.FV., M.D.C.-G., R.S.-R., Y.-D.H. and VV.S. analysed the GTC data and 
interpreted them accordingly. N.R-T. created the X-shooter and AlFOSC figures. J.P.U.F. and J.J. 
performed the analysis of the X-shooter and ALFOSC spectra. D.X. and P.J. contributed to the 
NOT programme and triggering. D. Malesani performed photometric analysis of NOT data. 

E. Peretti contributed to the development of the code for modelling afterglow radiation. L.1. 
triggered and analysed the OASDG data, and A.D.D. and A.N. performed the observations at 
the telescope. 


Competing interests The authors declare no competing interests. 


Additional information 

Correspondence and requests for materials should be addressed to R.M. 

Peer review information Nature thanks Xiang-Yu Wang and the other, anonymous, reviewer(s) 
for their contribution to the peer review of this work. 

Reprints and permissions information is available at http://www.nature.com/reprints. 


Article 


au SuperAGILE (20-60 keV) 


counts/32 ms 


aR Rl A Ma TY Wake 


BAT (15-350 keV) 
counts/det/64ms 


GBM (10-1000 keV) 
counts/32 ms 


MCAL (0.4-1.4 MeV) 
counts/32ms 


MCAL (1.4-100 MeV) 
counts/32ms 


LAT (0.1-10 GeV) 
counts/32ms 


0 5 10 jes 20 25 30 
T-T, [s] 
(e;1.4-100 MeV) and Fermi-LAT (f; 0.1-10 GeV). The light curve of AGILE-MCAL 


is split into two bands to show the energy dependence of the first peak. Error 
bars show lostatistical errors. 


Extended Data Fig. 1| Prompt-emission light curves for different detectors. 
a-f, Light curves for Super-AGILE (a; 20-60 keV), Swift-BAT (b; 15-150 keV), 
Fermi-GBM (c;10-1,000 keV), AGILE-MCAL (d; 0.4-1.4 MeV), AGILE-MCAL 
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cases (+15%, -15%), which define the limits of the systematic uncertainties. The 
contour regions are drawn from the 1oerror of their best-fit power-law 
functions. The vertical bars of the data points show the loerrors onthe flux. 


Extended Data Fig. 2| MAGIC time-integrated SEDs in the time interval 
62-2,400s after T,. The green (yellow, blue) points and band show the results 
of the Monte Carlo (MC) simulations for the nominal and the varied light scale 


Article 


[ T T T T T 
he UBV 
10°F ae 
&, 
fou) 
= . ° 
ical oa 
B ag2h PI if J 
10 ot 1 
2 te Sh +r Tif 
g . "wy * Chad #yt 
=) 1* re a 
5 -4 | = * Ls * "Y | 
= 10 * me 
al © NOT x "Bos ty wy 
© UVOT * =e E* 
[| * GROND a wth ] 
4 NTT wae * 
OASDG 
10°F our #4 
v REM 
1 1 1 1 1 
10° 10° 10° 10° 10° 
T-T, [s] 


Extended Data Fig. 3 | Afterglow light curves of GRB 190114C. Flux density at 
different frequencies as a function of the time since the initial burst, T— Tp. 

a, Observation in the NIR, optical and UV bands. The flux has been corrected 
for extinction inthe host and in our Galaxy. The contribution of the host galaxy 
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and its companion has been subtracted. Fluxes have been rescaled (except for 
the r-band filter). b, Radio and submillimetre observations from1.3 GHzto 
670 GHz. ‘Instr.’, instrument. 


Extended Data Fig. 4 | Images of the localization region of GRB190114C. 

a, All-sky image captured with the CASANDRA-1 camera at the BOOTES-1 
station. The image (30s exposure, unfiltered) was taken at 7, + 14.8s, and was 
severely affected by the moon. At the GRB190114C location (red dot) no prompt 
optical emission is detected. Inset, magnification (inverted colours) 
containing a10’-diametercircle centred on the optical position. b, Three- 
colour image of the host of GRB190114C, obtained with the HST. The host 
galaxy is aspiral galaxy, and the green circle indicates the location of the 


transient close toits host nucleus. The image is 8” across; northis up and east is 
to the left. c—e, Images of the GRB 190114C field taken with the HST, obtained 
with the F8SOLP filter (covering roughly the region from 800 to1,100 nm). Two 
epochs, 11 February and 12 March 2019, are shown (images are 4” across); the 
right-most image is the result of the difference image. A faint transient is visible 
close to the nucleus of the galaxy, and we identify this as the late-time afterglow 
of the burst. 
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Extended Data Fig. 5 | Optical-NIR spectra of GRB190114C. a, NOT/AIFOSC 14 January 2019, 23:32:03 UT with the RIOOOB and R2500I grisms. The emission 
spectrum obtained at mid-time (i.e., the epoch corresponding toa half of the lines of the underlying host galaxy are noticeable, besides the Call absorption 
exposure length) 1h post-burst. The continuum is afterglow-dominated at this lines in the afterglow spectrum.c, Visible-light region of the VLT-X-shooter 
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telluric absorption). b, Normalized GTC (+OSIRIS) spectrum obtained on lines from the star-forming host galaxy. 


t=8.8x10's 
t=1.4x10's 
t=2.4x10's 
t=1x10"s 

t=2.7x10's 


Flux density [mJy] 
— 
(en) 


10 


10 12 14 16 18 


10 10 10 10 


Frequency [Hz] 


10 10 


Extended Data Fig. 6 | SEDs from radio frequencies to X-rays at different triangle symbols, GMRT and MeerKAT; stars, ATCA; violet filled circle, ALMA, 
epochs. The synchrotron frequency v,, crosses the optical band, moving from downarrows,JCMT loupper limits; filled circles, LT (yellow) and GROND (all the 
higher to lower frequencies. The break between 108 and 10" Hz is caused by the other colours). Error bars for all data points define the loerror. Coloured 
self-absorption synchrotron frequency, v,,. Optical (X-ray) data have been stripes show the best fit of the XRT data extrapolated to the time of each SED. 
corrected for extinction (absorption). The data points are taken from the Their vertical width is obtained from the error (90% confidence level) onthe 


following telescopes (from lower to higher frequencies): filled and empty best-fit normalization. Solid lines show the model SEDs for the case s=2. 
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Extended Data Fig. 7 | Modelling of broadband light curves. Modelling 
results of forward shock emission are compared to observations at different 
frequencies (see key). The model shown with solid and dashed lines is 
optimized to describe the high-energy radiation (teraelectronvolt, 
gigaelectronvolt and X-ray) and has been obtained with the following 
parameters: s=0, €,= 0.07, €,=8 10°, p=2.6, ny =0.5and £, =8 x10* erg. Solid 
lines show the total flux (synchrotron and SSC) and the dashed line refers to the 
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SSC contribution only. Dotted curves correspond toa better modelling of 
observations at lower frequencies, but fail to explain the behaviour of the 
teraelectronvolt light curve; they are obtained with the following model 
parameters: s=2,€,=0.6,£,;=10*,p=2.4,A.=0.land£,=4 x10” erg. Vertical 
bars onthe data points show the loerrors onthe flux, and horizontal bars 
represent the duration of the observation. 


Extended Data Table 1| MAGIC spectral-fit parameters for GRB 190114C 


Time bin Normalisation Photon index Pivot energy 
[seconds after To ] [TeV-'cm-*s—"] [GeV] 
62 — 90 ge say eem (Oa Se Warns 395.5 
68 — 180 1AORe ? 10-* -2.27 Os 404.7 
180 — 625 2.2643) - 10-8 -2.56 O87 395.5 
68 — 110 1.74018 «1077 2.1602? 386.5 
110 — 180 eso -10-% 25104! 395.5 
180 — 360 S500 310 2 2.36027 395.5 
360 — 625 1.65378 -10-° BIG a 369.1 
625 — 2400 aso ear? +2.80 122 369.1 
62 — 2400 (Nominal MC) 1.071098 10-8 = -2,5 4 #020 423.8 
62 — 2400 (Light scale +15% MC) —7.95#°58 . 10-8 -2.91 1028 369.1 
62 — 2400 (Light scale -15% MC) —-1.34#°.°8 . 49-8 07 ie 509.5 


For each time bin, the table shows the start and end time of the bin, the normalization factor of the EBL-corrected differential flux at the pivot energy with statistical errors, photon indices with 
statistical errors, and the pivot energy of the fit (fixed). 
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Extended Data Table 2| GROND photometry 
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Time Toronp after the BAT trigger. The AB magnitudes are not corrected for Galactic foreground reddening. 
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Extended Data Table 3 | LT, NOT and UVOT observations 


UTC Filter Exposure (s) Magnitude 
LT/0:0 
2019-01-14.975 g 45 19.08+0.06 
2019-01-14.976 r 45 18.22+0.02 
2019-01-14.977 i 45 17.49+0.02 
2019-01-14.978 Zz 45 17.12+0.02 
2019-01-14.979 B 45 19.55+0.15 
2019-01-14.980 Vv 45 18.81+0.08 
2019-01-15.814 r 60 19.61+0.05 
2019-01-15.818 z 60 
2019-01-15.820 i 60 
2019-01-15.823 g 60 
NOT/AIFOSC 
2019-01-14.89127 g 1 x 300 17.72+0.03 
2019-01-14.89512 r 1 x 300 16.930.02 
2019-01-14.89899 i 1 x 300 16.42 10.04 
2019-01-14.90286 Zz 1 x 300 16.17 10.04 
2019-01-23.8896 i 6 x 300 21.02+0.05 
UVOT 
Tstart Tstop Filter Magnitude Tgtart Tstop Filter Magnitude 
56.63 57.63 Vv 130958 142524 UvM2 20.37 
57.63 58.63 Vv 217406 222752 UvM2 20.48 
58.63 59.63 Vv 107573 125233 U 
59.63 60.63 Vv 205500 210750 U 
60.63 61.63 Vv 291188 302718 U 
61.63 62.63 Vv 400429 412385 U 
62.63 63.63 Vv 616 627 Vv 
615.95 625.95 Vv 16295 17136 Vv 
73.34 83.34 white 26775 27682 Vv 
83.34 93.34 white 39149 57221 Vv 
93.34 103.34 white 108064 125736 Vv 
103.34 113.34 white 206689 211356 Vv 
113.34 123.34 white 292383 303996 Vv 
123.34 133.34 white 401305 413316 Vv 
133.34 143.34 white 4044 51522 uv 
143.34 153.34 white 131216 142656 Uuvw1 
153.34 163.34 white 217984 223056 Uuvw1 
163.34 173.34 white 592 612 uvw2 
173.34 183.34 white 6056 56384 Uuvw2 
183.34 193.34 white 130699 142346 Uuvw2 
193.34 203.34 white 216828 222404 uvw2 
562.0 572.0 white 566 586 white 
572.0 582.0 white 607389 613956 white 
535.5 555.5 B 624452 682416 white 
545.5 565.5 B 745033 769296 white 
285.9 305.9 U 818840 837216 white 
305.9 325.9 U 893522 907116 white 
325.9 345.9 U 991065 1004196 white 
345.9 365.9 U 1077542 1094616 white 
365.9 385.9 U 1140343 1170336 white 
385.9 405.9 U 1220661 1274376 white 
405.9 425.9 U 5851 6050 white 
425.9 445.9 U 21950 22857 white 
445.9 465.9 U 1353459 1359284 white 
465.9 485.9 U 1502211 1548336 white 
485.9 505.9 U 1692292 1703935 white 
505.9 525.9 U 2132978 2146056 white 
542 561 B 2299521 2317956 white 
5646 5845 B 63686 80942 white 
21038 46521 B 107900 125591 white 
62774 96486 B 206292 211137 white 
107737 125412 B 291984 303556 white 
205896 210944 B 401012 413029 white 
291586 303137 B 491973 505356 white 
400721 412707 B 74 224 white 
3839 50615 UVM2 


Magnitudes are SDSS ‘AB-like’ for ugriz and ‘Vega-like’ for all the other filters, and they are not 
corrected for Galactic extinction. For the UVOT data, magnitudes without uncertainties are 
upper limits. 
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Extended Data Table 4 | Observations of the host galaxy 


Filter Host Companion Combined 
Sloan u 23.54 25.74 23.40 
Sloan g 22.51 23.81 22.21 
Sloan r 22.13 22.81 21.66 
Sloan i 21.70 22.27 21.19 
Sloan z 21.51 21.74 20.87 
2MASS J 20.98 21.08 20.28 
2MASS H_— 20.68 20.82 20.00 
2MASS Ks 20.45 20.61 19.77 


For each filter, the estimated magnitudes are given for the host galaxy of GRB 190114C, the 
companion and the combination of the two objects. 


Extended Data Table 5 | Observations of GRB 190114C by ATCA and JCMT SCUBA-2 


ATCA 
Start Date and Time EndDateandTime Frequency Flux 
GHz mJy 

1/16/2019 6:47:00 1/16/2019 10:53:00 5.5 1.92+0.06 

9 1.78+0.06 

18 2.62+0.26 

1/18/2019 1:45:00 1/18/2019 11:18:00 5.5 1.13+0.04 

9 1.65+0.05 

18 2.52+0.27 

44 1.52+0.15 

1/20/2019 3:38 1/20/2019 10:25:00 5.5 1.78+0.06 

9 2.26+0.07 

18 2.30+0.23 

JCMT SCUBA-2 
UT Date Time since Timeon Typical Typical 850 um RMS = 450 xm RMS 
trigger source 225GHzCSO _ elevation density density 
(days) (hours) Opacity (degrees) (mJy/beam) (mJy/beam) 

2019-01-15 0.338 1.03 0.026 39 1:7 9.2 
2019-01-16 1.338 1.03 0.024 39 1.6 8.4 
2019-01-18 3.318 0.95 0.031 37 17 11.4 


For the ATCA data, the start and end dates and times (UTC) of the observations, the frequency and the flux (10 error) are reported. For the JCMT SCUBA-2 data, the CSO 225-GHz opacity 
measures the zenith atmospheric attenuation. 
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Gamma-ray bursts (GRBs) are brief flashes of y-rays and are considered to be the most 
energetic explosive phenomena in the Universe’. The emission from GRBs comprises 
ashort (typically tens of seconds) and bright prompt emission, followed by amuch 
longer afterglow phase. During the afterglow phase, the shocked outflow—produced 
by the interaction between the ejected matter and the circumburst medium—slows 
down, anda gradual decrease in brightness is observed’. GRBs typically emit most of 
their energy via y-rays with energies in the kiloelectronvolt-to-megaelectronvolt 
range, but a few photons with energies of tens of gigaelectronvolts have been 
detected by space-based instruments’. However, the origins of such high-energy 
(above one gigaelectronvolt) photons and the presence of very-high-energy (more 
than 100 gigaelectronvolts) emission have remained elusive*. Here we report 
observations of very-high-energy emission in the bright GRB 180720B deep in the GRB 
afterglow—ten hours after the end of the prompt emission phase, when the X-ray flux 
had already decayed by four orders of magnitude. Two possible explanations exist for 


the observed radiation: inverse Compton emission and synchrotron emission of 
ultrarelativistic electrons. Our observations show that the energy fluxes in the X-ray 
and y-ray range and their photon indices remain comparable to each other 
throughout the afterglow. This discovery places distinct constraints on the GRB 
environment for both emission mechanisms, with the inverse Compton explanation 
alleviating the particle energy requirements for the emission observed at late times. 
The late timing of this detection has consequences for the future observations of 
GRBs at the highest energies. 


On 20 July 2018, GRB 180720B triggered the Fermi Gamma-ray Burst 
Monitor (GBM) at 14:21:39.65 universal time (UT)° (7) and the Swift Burst 
Alert Telescope (BAT) 5s later®. Multi-wavelength follow-up observa- 
tions were performed up to 7, + 3 x 10°s by the European Southern 
Observatory’s Very Large Telescope, which measured a redshift of 
z= 0.653 (ref. ”). In the high-energy y-ray band (100 MeV-100 GeV) 
this GRB was also detected by the Fermi Large Area Telescope (LAT) 
between 7, and 7, + 700 s with a maximum photon energy of 5 GeV 
at Ty + 142.4 s (ref. 8). No further high-energy emission was detected 
in the successive observation windows after 700 s. The prompt emis- 
sion phase of GRB 180720B is extremely bright, ranking seventh in 
brightness among the over 2,650 GRBs detected by Fermi-GBM so far 
(see Methods). With a 7, (the time in which 90% of the flux is detected) 
of 48.9 + 0.4.s, GRB 180720B is categorised as a long GRB (typically 
associated with the death of massive stars’), with an isotropic energy 
release of E*° = (6.0 + 0.1) x 10 erg (50-300 keV; 1 erg =10°’ J). Obser- 
vations of this GRB took place using the Swift X-ray Telescope (XRT), 
identifying a bright afterglow that remained detectable until almost 
30 days after T, (refs.’°"; Fig. 1). In terms of energy flux of the X-ray 
afterglow (0.3-10 keV, at 7, + 11h), this GRB ranks second after the 
exceptional GRB 130427A°. 

Observations with the High Energy Stereoscopic System (H.E.S.S.) 
array began at 7, + 10.1 h and lasted for two hours. The data 
were analysed using methods optimized for the detection of the 
lowest-energy events, revealing a new y-ray source (Fig. 2a) with 
an excess of 119 y-ray events and a statistical significance of 5.30 
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(5.0opost-trial; see Methods). They-ray excessis well fitted by a point-like 
source model centred at aright ascension of OOh 02 min7.6s anda dec- 
lination of -02°56’06” (J2000) with a statistical uncertainty of 1.31’, con- 
sistent with the measurements at other wavelengths®*”. To rule out the 
association of this source with an unknown steady y-ray emitter (such 
as an active galactic nucleus) or persistent systematic effects, the GRB 
region was re-observed under similar conditions 18 days after these 
observations. In total, 6.75 h of data were analysed, resulting in a sky 
map consistent with background events (Fig. 2b). 

The flux spectrum detected by H.E.S.S. (100-440 GeV) was fitted with 
afunction of the form Fy,.(E) = Fin(E) x e ™, where the exponential term 
accounts for the absorption of photons by the extragalactic background 
light, ris the optical depth and F,,,,(E) = Fo in(E /Eo, int) ’™* iS a power law 
describing the intrinsic source emission. The analysis resulted in a pho- 
ton index of y;,,=1.6 + 1.2 (statistical) + 0.4 (systematic) and a flux nor- 
malization of Fo ine = (7.52 +2.03 (statistical) *$33(systematic)) x 10 7° 
TeV'cm’s”, evaluated at an energy of Fo jn, = 0.154 TeV(see Methods). 

The very-high-energy (VHE) flux, together with measurements at 
other wavelengths, is shown in Fig. 1. Apart from the exceptionally high 
flux level, the light curves show a typical power-law behaviour in the 
X-ray and optical afterglow with a temporal flux decay of the form 
F(t) « €* with dyp_¢ = 1.29 + 0.01 and Qopticai = 1-24 + 0.02. The spectrum 
measured by Fermi-LAT (100 MeV-10 GeV) from 7, + 55s to 7,+ 700s 
is well fitted by a power-law model with photon index y, 4; =2.10 + 0.10. 
The light curve in the same time window is fitted by a power law 
with atemporal decay index of a, ,;=1.83 + 0.25. Itis worth noting that 
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Fig. 1| Multi-wavelength light curve of GRB 180720B. a, Energy-flux light 
curve detected by Fermi-GBM (band fit; green), Fermi-LAT (power law; blue), 
H.E.S.S. (power-law intrinsic; red) and the optical r-band (purple). The Swift- 
BAT spectra (15 keV-150 keV) are extrapolated to the XRT band (0.3-10 keV) to 
producea combined light curve (grey) and an upper limit (95% confidence 


Q,ar iS at about lo from the mean value of the distribution of the 
decay indices of long GRBs detected by Fermi-LAT“ (@, 4; = 0.99 + 0.04, 
0qz= 0.80 + 0.07) and such deviation could largely depend onthe time 
range in which a,,; is fitted, potentially in agreement with ay; 
and optical 

The detection of VHE y-ray emission indicates the presence of very 
energetic particles in the GRB afterglow. This discovery is consistent 
with efficient y-ray emission seen in other astrophysical sources with 
relativistic plasma outflow, for example, pulsar wind nebulae or jets 
emerging from the nuclei of active galaxies. In the case of aGRB after- 
glow, the particle acceleration probably occurs at the forward shock” 
(the compression shock wave propagating through the circumburst 
material), which should be capable of efficient electron and proton 
acceleration. As proton radiation processes are characterized by 
long energy-loss timescales relative to the dynamical timescale, the 
detected y-ray emission is probably produced by accelerated elec- 
trons (see Methods). Therefore, two radiation processes are the most 
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Fig.2|Very-high-energy y-ray image of GRB 180720B. Significance map of 
GRB180720B field, as observed by H.E.S.S.a, Observation made at 7,+10.1h 
for2h.b, The same region of the sky, as observed during consecutive nights 


level) for the second H.E.S.S. observation window (power-law intrinsic, 
red arrow). The black dashed line indicates atemporal decay with a=~1.2. 
b, Photon index of the Fermi-LAT, Swift and H.E.S.S. spectra. Error bars 
correspondtolo. 


plausible dominant contributions to the VHE spectrum: synchrotron 
emission of an electron population in the local magnetic field’® and 
synchrotron self-Compton (SSC) scattering”””®. In the latter case, the 
synchrotron photons, which are thought to dominate the target radia- 
tion”, are inverse-Compton-scattered to higher energies by the same 
electron population. 

The SSC and synchrotron emission origin scenarios’ place distinctly 
different demands on the source acceleration efficiency. Whereas an 
SSC origin requires electrons with only multi-gigaelectronvolt ener- 
gies, asynchrotron origin requires an extreme accelerator potentially 
accelerating beyond petaelectronvolt energies”° (see Methods). Fur- 
thermore, for GRBs to operate as 10”° eV cosmic-ray sources, they must 
achieve extreme acceleration”. One key distinguishing characteristic 
between these two emission origins is that SSC predicts the presence 
of two bumps in the spectral-energy distribution with their height 
ratio depending onthe energy densities of both the electrons andthe 
magnetic field, whereas a synchrotron model implies only a broad 
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between 7, + 18.4 dand 7, +24.4 d. The red cross indicates the position 
reported by the optical telescope ISON-Castelgrande”. 
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single component. A second difference between these processes is 
the maximum photon energy achievable. 

Considering a synchrotron origin of the broadband afterglow energy 
spectrum, the highest energy for synchrotron emission from electrons 
inamaximally efficient accelerator Is Eo = 91fmc?/(4a,) = 100/ Mev 
(with a, the fine-structure constant and/ the bulk Lorentz factor of the 
forward shock). Thus, for electron synchrotron emission to reach ener- 
gies beyond 100 GeV 10 h after the prompt emission, a late-time in 
excess of 1,000 appears to be required. Such a scenario is difficult to 
realize, with robust expectations suggesting a value of [= 20 at10h 
(see Methods). Alternatively, circumvention of this synchrotron max- 
imum energy limit is possible for scenarios in which either the coher- 
ence length of the magnetic turbulence is very small, or different 
magnetic-field strengths are present in the acceleration and emission 
zones, or some non-ideal process is responsible for the particle accel- 
eration (see Methods). Regardless of this challenge, this could explain 
the similarity in the photon index and level of energy flux of the emis- 
sion seen both at early times by Fermi-LAT and Swift-XRT and at 
late times by H.E.S.S. and Swift-XRT (Fig. 1). However, the strong require- 
ments for synchrotron emission to extend up to the VHE regime 
disfavours such origin and the potential onset of anew SSC component 
should be considered. 

The SSC scenario has the advantage that the emission up to VHE 
at late times is energetically much more easily achievable”, leading 
to the expectation of a new spectral component at VHE. The H.E.S.S. 
spectral-fit constraints (Extended Data Fig. 1) are consistent with such 
a possibility within the present uncertainties. Despite this advantage, 
the potential onset of inverse Compton emission within the Klein- 
Nishina regime faces challenges (see Methods). Specifically, beyond 
the y-ray energy where this sets in, asofter spectral slope and a differ- 
ent brightness evolution of this component” are expected. However, 
interestingly, the presence of synchrotron emission with a hard pho- 
ton index extending below kiloelectronvolt energies can sufficiently 
delay the onset of the full Klein-Nishina transition to higher energies 
(see Methods), beyond that of the VHE detection. The detection of 
this hard extended synchrotron emission component thus delivers 
additional supporting evidence for an SSC origin. 

This VHE discovery undoubtedly opens a key channel to the under- 
standing of the GRB afterglow phenomena. This measurement proves 
to be complementary to the VHE-afterglow emission detected in GRB 
190829A” and the prompt-to-early afterglow emission measured in 
GRB 190114C by the MAGIC telescopes”, providing insight into the 
nature of GRBs and their VHE detectability. We estimate that future 
instruments, such as the Cherenkov Telescope Array”, will allow up 
to three more GRB afterglow detections per year in the VHE domain 
than previously anticipated (see Methods), considerably improving 
our understanding of GRBs over a diverse range of timescales. 
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Methods 


H.E.S.S. and the GRB follow-up programme 

The observations presented in this paper were performed using the 
H.E.S.S. array of imaging atmospheric Cherenkov telescopes, whichis 
situated at an altitude of 1,800 min the Khomas highlands of Namibia. 
H.E.S.S. is sensitive to y-rays in the energy range from tens of gigae- 
lectronvolts to tens of teraelectronvolts. It consists of five Cherenkov 
telescopes: four with mirror areas of 108 m? placed ina square configu- 
ration witha side length of 120 m (CT1-CT4) anda single telescope at 
the centre (CT5) witha mirror area of 614 m’. Thanks to its low energy 
threshold and fast slewing (200° min”)”®, CTS is well suited for the 
observation of soft-spectrum transient sources. 

H.E.S.S. maintains an active transient-source observation pro- 
gramme, of which GRBs are an important component. To ensure a 
fast reaction to GRB alerts, H.E.S.S. is connected to the y-ray coor- 
dinates network (GCN), which rapidly distributes alerts and 
observational information from space and ground-based facilities. 
The target-of-opportunity observation system in H.E.S.S. performs the 
selection, filtering and processing of these alerts onthe basis of source 
observability and significance, aiming to trigger on bright, precisely 
located, nearby bursts. Alerts are followed up in two different observa- 
tion modes. Observations are triggered in the prompt mode when the 
GRB position is observable from the H.E.S.S. site at the time that the 
alert is received. In this case, the observation schedule is interrupted 
and the array is automatically re-pointed to the GRB location. On the 
other hand, afterglow observations take place for GRBs that become 
observable only ata later time; such observations are scheduled manu- 
ally and are triggered by a burst advocate. This was the case for GRB 
180720B, which was observed from 7, + 10.1h, when the burst position 
rose above 45° in elevation (below this elevation GRBs are typically 
not observed owing to the rapid increase in the energy threshold of 
the H.E.S.S. telescopes). Re-observations were carried out at 7, +18 d, 
after the end of an intervening moonlight period. 


H.E.S.S. data analysis 
To reach the lowest possible energies in the analysis presented here, 
we use only data from the single large telescope (CT5). However, this 
energy threshold reduction comes at the cost of some angular resolu- 
tion and sensitivity loss*’. We present here two hours of observations 
taken in wobble mode”, with the pointing direction of the telescope at 
an offset of 0.5° from the position provided by Swift-BAT®. This observa- 
tion was made ata mean zenith angle of 31.5° for a total live time of 1.8 h. 

To ensure that a potential GRB signal is not diminished by an exces- 
sive number of statistical trials, the data analysis is subjected to a strict 
unblinding procedure. The first step in this unblinding is aninspection 
of the low-level data, as some calibration artefacts can directly lead to 
the creation of spurious sources in the field of view. Checks are made 
on the fractional event participation of each camera pixel (to ensure 
that single faulty pixels do not dominate the events), the pixel pedes- 
tal values and the distribution of events within the field of view. Once 
these checks are completed, with no artefacts found, the event prop- 
erties are reconstructed using the ImPACT*? maximum-likelihood- 
based fitting technique. Background cosmic-ray events are rejected 
using a neural-network-based scheme”. The residual background 
contamination level of the source region (ON and OFF events) and 
the ratio of the on-source time to the off-source time (a;,,) are then 
estimated using the ring method for the production of maps and the 
reflected-region method when performing the spectral extraction”. 
Full analysis and checks are performed using an additional independ- 
ent calibration and data analysis chain®, serving as a cross-check of 
all the results. 

The source significance is computed using a maximum-likelihood 
ratio test based on the number of events coming from the source (ON) 
and the background (OFF) for a given ratio of on-source to off-source 


time (a; ref. °°). For the ring method, the number of ON and OFF events 
is544 and 4,740, respectively, and a;,,= 0.09, resulting in a significance 
of detection of 5.30. Similarly the reflected-region method measures 
544 ON and 3,998 OFF events and a;,, = 0.11 at a significance of 4.60, 
which is verified by the cross-check analysis, which provided 651 ON 
and 5,200 OFF events and @;,, = 0.10 with a significance of 4.50. 

The source morphology is fitted with a two-dimensional likelihood 
procedure by assuming point-like and Gaussian source models con- 
volved with the expected energy-dependent point spread function 
(obtained from simulations) and the measured source spectrum. Both 
source models are proved to be compatible with the morphology of 
the discovered source, with no statistically significant preference for 
source extension shown. 

Spectral analysis is performed using the forward-folding method”, 
which corrects forthe limited energy resolution of the single-telescope 
event reconstruction. The measured source spectrum is obtained by 
fitting a simple power-law model of the form Fyy.(E) = Fo, ops(E /Eo, obs) °° 
where Fon; is the flux normalization, y,,, is the photon index and E45 
is the reference energy. However, owing to the absorption of the most 
energetic photons by the extragalactic background light (EBL), the 
apparent photon index of this source will be somewhat steeper than the 
intrinsic photon index. The intrinsic spectrum F,,,(E) is therefore 
obtained by fitting the measured spectrum with an attenuated 
power-law model, F,y.(E) = Fine(E) x @ °"') = Fo int(E/Eo, int) 2 x OE, 
where the last term in the equation corresponds to the EBL absorption 
coeficient predicted” for a redshift of 0.653. The best-fit 
spectra, together with the spectral points, are shown in Extended Data 
Fig. 1, and the spectral parameters are summarized in Extended Data 
Table 1. 

Systematic uncertainties in the fitted spectra are determined by 
accounting for a 15% uncertainty in the reconstructed energy due to 
possible variations in the measured Cherenkov light yield®®. The meas- 
ured energy is systematically shifted by +15% and the whole spectral- 
fitting procedure is redone. In addition, short dips in the trigger rate 
(at the level of 30%) were identified in the data, which can probably 
be attributed to the presence of high-altitude clouds. To assess the 
effect of these, the time windows containing such trigger rate features 
(21.7 min total) were removed from the data and the standard analysis 
described above was performed on the reduced dataset. From this, we 
conclude an additional systematic underestimation of 32% and 4.8% in 
the measured normalization and photon index, respectively. These two 
sources of systematic uncertainty are considered to be independent 
and are therefore added in quadrature for the estimation of the total 
systematic uncertainty. 

The intrinsic spectrum was obtained with a chosen EBL model”. To 
determine how this choice influences the results presented in this work, 
the data were re-analysed using three additional EBL models” *!, each 
one employing a different approach to predict the overall EBL level”. 
The absorption coefficient for a redshift of 0.653 within the energy 
range of the detected emission does not present sizeable deviations 
between the models considered (Extended Data Fig. 2). When employ- 
ing these EBL models for the spectral fit, a change of up to 55.3% and 
27% was found in the reported normalization and index, respectively. 
The statistical uncertainty on the fitted spectra remains the biggest 
source of uncertainty in the results. 


Trial correction 

Since 2012, H.E.S.S. has performed five additional follow-up observa- 
tions of well localized GRBs (Swift and Fermi-LAT alerts) using only CT5 
(similar to the observations presented here). The significance distribu- 
tion of this sample (excluding GRB 180720B) is consistent with pure 
statistical fluctuations. Therefore, the post-trial significance for GRB 
180720B can be assessed by accounting for these previously observed 
GRBs. This results in a post-trial significance of 4.30 (reflected-region 
method) and 5.00 (ring method). As the analysis of GRB 180720B was 


performed once under the aforementioned unblinding procedure, 
no additional trials have been added to the results presented here. 


Background systematic effects 

Systematic effects onthe sky map background (Fig. 2) were determined 
by measuring the significance distribution when excluding the source 
region. Although a normal distribution was expected, a width of 1.09 
was measured in this significance distribution, therefore adding aslight 
shift to the reported significance of the ring method (used in the pro- 
duction of sky maps). The corrected significance when accounting 
for such effects is 4.90 (4.70 post-trial). Nonetheless, this measured 
distribution depends strongly on the parameters of the ring method 
and should be subject to statistical uncertainties. 


Fermi data analysis 
The Fermi-GBM data for GRBs are publicly available through the GBM 
Burst Catalog at HEASARC™. For GRB180720B the available time-tagged 
events of those detectors having the best viewing angle to the Swift- 
XRT position—namely, n6, n7, nb and b1—were analysed. Temporally 
resolved energy-flux data points (Fig. 1) were obtained with the RMfit 
analysis software” by combining time-tagged event data fromall four 
detectors into 256-ms bins in the energy range from 8 keV to 10 MeV. 
The analysis of the Fermi-LAT data was performed using the ‘Pass8** 
processed events. We used the P8R3_TRANSIENTOIOE event class, 
whichis suitable for transient-source analysis, and the corresponding 
instrument response functions*. Events were selected from 7, to 
T, + 700 s in the standard GRB analysis energy range of 100 MeV- 
100 GeV over a region of 10° around the Swift-XRT localization. Event 
selection, quality cuts and data analysis were performed with the stand- 
ard FermiTools* software. The source detection over the full duration 
was determined by alikelihood analysis providing a test-statistic value 
of TS = 600, which corresponds to a significance of 0= 25 (a= JTS). 
Because the highest-energy photon detected has an energy of 5 GeV 
(at T, + 142.4), the temporally resolved energy-flux data points (Fig. 1) 
were computed in the energy range from 100 MeV to 10 GeV. The anal- 
ysis model included the Galactic interstellar emission model (gll_iem_ 
v06.fits) and the relative isotropic-diffuse-emission templates provided 
by the Fermi-LAT collaboration”, and the normalization of the latter 
was left free to vary. The spectrum for each bin was fitted by a 
single power-law model F(E) = Fo x (E/E) ¥, with the flux normalization 
F,and the photon index yas free parameters. As no emission with ener- 
gies >10 GeV was detected, no additional term was required to account 
for EBL absorption” in the spectra. The temporal decay a4, was fitted 
by a power-law model using a least-squares technique applied from 
T,+55sto 7,+ 700 sin order to ensure no contamination of the prompt 
emission observed by Fermi-GBM and Swift-BAT, obtaining a reduced 
xX’ of x? = 0.63 (14 degrees of freedom). 


Optical data 

The optical data shown in Fig. 1 were compiled fromthe GCN circulars 
of observations performed in the r-band by the following instruments: 
Kanata*®, MITSUME”, TSHAO*°, MASTER-K™, MASTER-I, ISON-Castel- 
grande”, OSN”, LCO™* and KAIT™. The reported temporal decay index 
optical WaS Measured from 7, + 9,642 to Ty +3.35 x 10°s by performing 
a power-law fit with a y’ fitting procedure. 


Swift data 

The Swift data are publicly available through the Swift online reposi- 
tory’. The temporally resolved energy-flux data shown in Fig. 1 were 
obtained using the Burst Analyser tool’°”. The data were rebinned 
to give a signal-to-noise ratio of 7 and systematic uncertainties were 
included. The temporal decay reported here (a@y,;) was obtained 
from 7, + 2,200 s to T, + 3.05 x 10° s and corresponds to the fourth 
break in the light curve, as identified from the fitting procedures of 
the Swift-XRT tools. 


Cherenkov Telescope Array detectability prospects 

Considering the Cherenkov Telescope Array (CTA) to be an order of 
magnitude more sensitive than the H.E.S.S. array implies that it will 
have the ability to detect energy fluxes ~10 times fainter than that of 
GRB180720B at VHE. If the VHE flux equals that detected by Swift-XRT, 
as suggested by our measurements (Fig. 1), we estimate the occurrence 
of three GRBs per year above this flux, which will therefore be detect- 
able by CTA (Extended Data Fig. 3). This number could be increased 
for follow-up observations at earlier times. By assuming a temporal 
decay value of a=1.2 (F(t) « ¢*) for all the GRB afterglows detected by 
Swift-XRT®, an extrapolation of the 11-h energy flux to that expected 
at 5h provides a detectability prospect of ~10 GRBs per year at such 
follow-up delay times. It should be noted, however, that the presence 
of VHE emission could also be dependent on the GRB environment”, 
and this influence was not considered in this estimation. 


Bulk Lorentz factor 

The bulk Lorentz factor depends ontwo factors: the released energy and 
the density of the circumburst medium®, /= ./ E'8°/(Mc2), where Eis 
the equivalent isotropic energy and Mis the total mass swept up by the 
shock. The latter depends onthe nature of the circumburst environment: 
M = (411/3)R’nm, for ahomogeneous medium (here nis the medium 
number density, R is the shock radius and m, is the proton mass) and 
M=M,R/v,forashock propagating ina constant-velocity wind (hereM, 
and v.are the wind mass-loss rate and velocity, respectively). The shock 
radius depends on the detection time as R = A,/°tc/(1+ z), where Ap =8 
forahomogeneous medium and A, =4 for wind environments (c, speed 
of light in vacuum). Thus, for GRB 180720B (t = 10 h, z = 0.653, 
and £*°~10™ erg) one obtains [= 15n5"/8(hereny=n/(1cm™)) for ahomo- 
geneous medium or [= 20M, 33(hereM, 5=M,/(10°M, yr ')and 
Ux 33=U+/(2,000 kms")) fora wind environment. 


Non-thermal process efficiency 

The non-thermal process efficiency, k= Cgyn/tcoor depends on the ratio 
of the shock dynamic timescale, tg, =R/(c/), to the cooling time, foo, 
which depends on the radiation mechanism, the density of the target 
and the energy of the parent particles. For hadronic processes”, which 
include proton-proton (pp) and photon-meson (py) channels, the 
radiation efficiency is x, ~10’[R/(10°cm)]n, and k,, ~3 x 10-“(//20)"[R/ 
(10'8 cm)](x/10)[E,/(1 keV) | ‘ny (here x is the total radiative efficiency 
and E£, is the peak frequency of the soft-emission component). These 
low efficiencies favour the electromagnetic processes”. The efficiency 
of the synchrotron channel for the emission detected in the VHE band, 
E=100 GeV, is Kgynco= 5 * 10” (m,/m)*?[R/(10"8 cm) ](n,/0.1)74n 2ne*, 
where m, and m are the masses of the electron and the emitting 
particle, respectively, 7, is a fraction of the internal energy contained in 
the magnetic field and y,,,, = max(1, Riar/A cor) defines the shift of the 
peak energy ifacharged particle interacts withaturbulent magnetic field” 
(here Ris the non-relativistic Larmor radius and.,,, isthe magnetic-field 
correlation length). If the inverse Compton scattering proceeds in the 
Thomson regime, thenx,-=3(//20)[k/(107)] [R/(10"8 cm) ][E,/(1keV)]*71No. 
Efficiencies larger than 1 indicate that particle cooling occurs faster than 
the source dynamical timescale and is therefore highly efficient. 


Synchrotron emission 

Synchrotron emission is characterized by the highest radiation 
efficiency, but this emission component peaks below the limiting 
energy Of Fyyne= LOO! (m/mMe)N,.4(Bem/Bac) (Eac/Bac) MeV. Here Bem and 
B,, are the magnetic-field strengths at the emitter and accelerator sites, 
respectively. The accelerating electric field, F,., is smaller than the 
magnetic field, F,.<B,,, if the particle acceleration proceeds in ideal 
magnetohydrodynamic flows”. Thus, the production of VHE y-rays via 
electron synchrotron emission requires a large Lorentz factor, [> 10°, 


avery-small-scale magnetic turbulence, A... <10 7R™,, a large change 
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of the magnetic-field strength, B,,, > 10’B,., particle acceleration to 
operate in the non-ideal magnetohydrodynamic regime, or acombina- 
tion of these factors. Proton synchrotron emission alleviates these 
requirements, but at the expense of a significantly lower radiation 
efficiency. Whereas proton synchrotron emission dominates over other 
hadronic radiation processes in terms of efficiency’, its efficiency is 
still considerably smaller than that of electrons. Thus proton synchro- 
tron emission is expected to give rise only to asubdominant emission 
component within the VHE band. 


Energy of particles emitting in the VHE regime 

The energy of particles emitting in the VHE regime depends on the 
dominant radiation mechanism and the properties of the ejecta. Inthe 
case of asynchrotron origin scenario, the particle energy is determined 
by three important factors: the shock Lorentz factor, the strength of 
the magnetic field and the turbulence scale. The first factor, / ~ 20, is 
relatively well defined by the epoch of the H.E.S.S. observation, but the 
magnetic-field strength and the possibility of small-scale turbulence 
remain highly uncertain. The internal energy density, ~0.1(//20)?n,J m°, 
suggests that a Gauss-strength magnetic field is expected for the case 
of energy equipartition between the magnetic field and particles. We 
note, however, that substantially smaller plasma magnetization is 
reported in the literature’, corresponding to weaker magnetic fields 
by several orders of magnitude. Assuming that synchrotron emission 
beyond the 100 MeV energy limit in the co-moving frame can be 
achieved, the energy of the emitting electrons can be estimated 
as E, ~ 4[E/(100 keV)}"/7(/20) ¥7[B/(0.1G)] 77,1? TeV. The produc- 
tion of 100-GeV y-rays through a synchrotron scenario therefore 
requires electrons of ultrahigh-energy, £, = 4 PeV, unless aconfiguration 
with a very-small-scale turbulence is present. The energy of particles 
that provide the dominant contribution to the inverse Compton emis- 
sion depends strongly on the spectrum of the target photons and the 
bulk Lorentz factor. An electron with energy £, up-scatters a target 
photon with energy £, to an energy of min{E,[E./(m,c’)/’, FE.}. For target 
photons detected in the X-ray energy band, F, ~ 1 keV, electrons with 
energy of F, ~ 10 GeV, which in the laboratory frame have an energy of 
hundreds of gigaelectectronvolts, can produce y-rays that are detected 
inthe VHE band. 


Target photons 

Target photons of very different energies can be up-scattered to y-rays 
of the same energy. This can be of particular relevance for VHE y-rays 
detected from GRBs, where both the target photons and non-thermal 
electrons probably have broad energy distributions. Assuming a power- 
law distribution for the target photon flux, dn/dE, < £,”, and for elec- 
trons, dn,/dE.« F.”, one finds that the relative contribution to the 
y-ray emission depends on the electron energy as~{1 — [E /(E.I) }E2”"". 
For simplicity, just a single high-energy term in the cross-section was 
accounted for (resulting in the factor 1 — [E/(E./)]), which is sufficient 
for a qualitative study. However, the obtained dependence shows that 
for a reasonable range of photon and electron indices, 1.5<y,y.<3,the 
highest-available-energy electrons may provide an important contri- 
bution to the y-ray energy band by up-scattering photons with energies 
within the infrared-to-ultraviolet range. 


Klein-Nishina cutoff 

The Klein-Nishina cutoff is a substantial reduction of the Compton 
cross-section that occurs when E.£, 2 /m2c*, where £, and E, are the 
electron and target photon energies in the co-moving frame and the 
laboratory system, respectively. This results in a softening of the y-ray 
spectrum that occurs for £ 2 50(//20)’[E,/(1 keV) ]" GeV. Because typi- 
cally the GRB synchrotron spectral-energy distribution peaks in the 
kiloelectronvolt band, the inverse Compton component detected at 
late afterglow phases may be affected by the Klein-Nishina cutoff, 
resulting in reduced fluxes and steeper spectra. This may appear to 


contradict the relatively hard intrinsic spectral index of y,,,~ 1.6 inferred 
from the H.E.S.S. measurement. There are, however, two effects that 
can result in spectral hardening at energies around the cutoff: (i) the 
up-scattering of low-energy infrared-to-ultraviolet photons, which 
give an intrinsic VHE component with the same slope as that seen in 
the hard-X-ray band and (ii) the hardness of the electron spectrum at 
gigaelectronvolt energies, where adiabatic losses probably render the 
electron spectrum hard. The search for consistency within this frame- 
work of the hard VHE spectrum with the SSC scenario, however, requires 
detailed dedicated simulations, which are beyond the scope of this 
observational paper. 


Data and code availability 


The raw H.E.S.S. data and the code used in this study are not public, 
but belong to the H.E.S.S. collaboration. All derived higher-level data 
that are shown in the plots will be made available on the H.E.S.S. col- 
laboration’s website upon publication of this study. Data and analysis 
code from the Fermi-GBM and LAT instruments are publicly available. 
Links to the data and software are provided in the Methods section. 
This work also made use of data supplied by the UK Swift Science Data 
Centre at the University of Leicester (http://www.swift.ac.uk/archive/). 
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Extended Data Fig. 1| VHE spectral fit of GRB 180720B. H.E.S.S. spectral fit to 
the measured emission in the energy range 100-440 GeV. a, Fit using a simple 
power-law model (with photon index y,,,). b, Fit with a power-law model (with 
photon index y;,,) with EBL attenuation for a source at z= 0.653 (ref."%). In both 
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cases the residual data points with louncertainties are obtained from the 
forward-folded method. The shaded areas show the statistical and systematic 
uncertainties in each fit (loconfidence level). The bottom panels show the 
significance of the residuals between the fitted model and the data points. 
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Extended Data Fig. 2| EBL absorption coefficient. Absorption coefficient e™ for asource emitting at a redshift of 0.653. The values are shown in the energy 
range of the detected emission of GRB 180720B (100-440 GeV) for the four EBL models considered”? “. 
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Extended Data Fig. 3| CTA detectability prospects. Energy-flux distribution 


Extended Data Table 1| VHE spectral information from GRB 180720B 
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Spectral parameters of the fits to the H.E.S.S. observed emission in the energy range 100-440 GeV. The intrinsic spectrum with y = 2.0 (third row) is provided as a reference to the Fermi-LAT 
mean photon index detected in several other GRBs at high energies". All reported uncertainties are statistical and systematic, in that order. 
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Heat pumps based on magnetocaloric and electrocaloric working bodies—in which 


entropic phase transitions are driven by changes of magnetic and electric field, 
respectively—use displaceable fluids to establish relatively large temperature spans 
between loads to be cooled and heat sinks’”. However, the performance of prototypes 
is limited because practical magnetocaloric working bodies driven by permanent 
magnets? ° and electrocaloric working bodies driven by voltage® * display 
temperature changes of less than 3 kelvin. Here we show that high-quality multilayer 
capacitors of PbScy ;Ta 9.50; display large electrocaloric effects over a wide range of 
starting temperatures when the first-order ferroelectric phase transition is driven 
supercritically (as verified by Landau theory) above the Curie temperature of 

290 kelvin by electric fields of 29.0 volts per micrometre. Changes of temperature in 
the large central area of the capacitor peak at 5.5 kelvin near room temperature and 
exceed 3 kelvin for starting temperatures that span 176 kelvin (complete 
thermalization would reduce these values from 5.5 to 3.3 kelvin and from 176 to 

73 kelvin). If magnetocaloric working bodies were to be replaced with multilayer 
capacitors of PbScy ;Ta 9.503, then the established design principles behind 
magnetocaloric heat pumps could be repurposed for better performance without 
bulky and expensive permanent magnets. 


The development of electrocaloric (EC) cooling devices in past® * and 
recent? times continues to lag the highly developed activity on near- 
room-temperature magnetocaloric (MC) cooling devices’. In parallel, 
the recent development of devices” in which uniaxial stress drives elas- 
tocaloric materials” is complemented by the nascent development of 
devices in which isotropic stress drives barocaloric materials”. Thermal 
changes are particularly large in these two types of mechanocaloric 
material””’, but it is difficult to adapt the highly evolved MC prototypes 
to use mechanocaloric materials instead. 

These MC prototypes’ typically use permanent magnets to address 
beds of commercial-grade Gd spheres, whose adiabatic temperature 
change of |A7] ~ 2.5 K (ref. *) drives heat exchange with a fluid, permit- 
ting heat to be pumped over much larger temperature spans. These 
temperature spans are established either along the fluid alone (passive 
regeneration)! or along the bed and the fluid together (active regenera- 
tion)*, such that heat is absorbed from the load at the cold end, and 
dumped to the sink at the hot end. MC effects larger than 2.5 K can 
be achieved in Gd by increasing the magnetic field and reducing the 
demagnetizing factor, but these modifications would be challenging 
tousein practical applications (see Supplementary Note 1). EC effects 
larger than 2.5 K have not previously been demonstrated in macroscopic 
bodies (bulk samples and MLCs) near room temperature if one requires 


unambiguous evidence in the form of directly measured temperature 
change (Supplementary Note 2). 

Bulk EC ceramics such as those used in EC cooling devices areno 
thicker than 0.1-0.5 mm, to avoid unduly compromising the breakdown 
field’, but the largest applied fields yield at best a directly measured 
value of |A7] = 2.2 K near room temperature (in PbSc, ;Ta,,;O;; that is, 
PST)* ®”. A larger temperature change may be achieved in order-of- 
magnitude thinner EC films”, but there is insufficient active material 
for applications (and therefore an innovative device’ based on the 
electrostatic actuation of a flexible polymer bilayer would struggle 
to continuously cool a macroscopic object). However, an assembly 
of EC films in the form of a multilayer capacitor (MLC) represents a 
viable working body that is macroscopic”. Although MLCs have now 
been exploited in several EC cooling devices*™ “”’, directly measured 
temperature jumps |A7,| have been limited to 2.2 K near room tem- 
perature (in MLCs based ona polymer)? and 2.7 K near 380 K (in MLCs 
based on 0.9Pb(Mg,;Nb.,;)0;-0.1PbTiO3)”. These highly adiabatic 
temperature jumps arise within the active area (where interdigitated 
electrodes overlap) following rapid thermalization between active and 
inactive layers, and are typically measured at the face centres of MLCs. 

Here we describe high-quality MLCs based on the well-known 
EC material PST° °*4, which is paraelectric near and above room 
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Fig.1|MLC structure. a, Optical image, and b, schematic illustration, showing 
the uppermost inner electrode (dark brown), part of the next inner electrode 
(light brown), the two outer electrodes (grey) and through-thickness PST 
(yellow). Image contrast was optimized to distinguish each region. The MLC 
shown here is equivalent to MLC1. Inb, the white square denotes the face centre 
(300 um x 300 pm), and the four white dots define the corners of arectangle 
that represents the active area in which the inner electrodes overlap. c, Cross- 
sectional schematic, showing six rather than all 21 layers of PST. 


temperature. On cooling, there is a broad transition to a relaxor state 
(with glassy ferroelectric order) if the B-site cation order is low, and 
thereisasharp first-order phase transition toa ferroelectric state ifthe 
B-site cation order is high. We used highly ordered PST to access large 
EC effects associated with the latent heat of the first-order transition, 
which increases with increasing B-site cation order?” as expected fora 
first-order transition when varying the degree of disorder”. By driving 
this transition supercritically, we accessed about 1.5 times the entropy 
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Fig. 2| Indirect EC measurements. a, Polarization P(7,£) constructed from 
upper branches of 385 unipolar P(E) plots measured isothermally every 0.35K 
on warming (dotted line shows phase boundary with critical endpoint). b, The 
reversible isothermal entropy change AS(7,£) > 0 dueto the removal of field E, 
as determined froma. c, Entropy S(7,£) obtained by subtracting EC entropy 
change AS(7,E£) > 0 from zero-field entropy S(7) = S(T) — $(270 K) 


associated with this latent heat. To do this while avoiding electrical 
breakdown and leakage-induced Joule heating required fine grains 
of similar size, a low density of physical and chemical defects, and no 
discernible impurity phases. Our sintering and annealing conditions 
were optimized to yield all of these properties, as well as high B-site 
cation order. We copied the MLC geometry that we previously used 
for MLCs of 0.9Pb(Mgi3Nb;/3)03;-0.1PbTiO3, in which 19 active layers 
yielded larger value of |A7;| than 14 layers” or 49 layers’, and in which 
the layer thickness of about 40 pm falls in a broad range for which the 
breakdown field is maximized. However, we increased the active area 
at the expense of the surrounding inactive area to generate more EC 
heat and reduce internal thermalization, while retaining sufficient 
inactive area to suppress cracking by suppressing piezoelectricity. 
The inactive area also hinders breakdown by suppressing electrical 
discharge between the inner electrodes. 

EC effects were directly measured with an infrared camera whose 
field of view covered most of the MLC face. The resulting values of 
|A7;| reported here describe the face centre and were corroborated 
by using athermocouple. As a result of driving the first-order ferro- 
electric phase transition supercritically without electrical breakdown, 
we were able to achieve large EC effects over starting temperatures 
whose wide range would traditionally be associated with smaller EC 
effects in relaxors’. Specifically, our measured changes of temperature 
peaked at |A7,| ~ 5.5 K near room temperature and exceeded |A7;| =3 K 
for starting temperatures that spanned 176 K in the range 294-470 K. 
As we show in this paper, the height and width of this peak compare 
favourably with respect to the MC effects‘ that can be driven inamac- 
roscopic volume of commercial-grade Gd using bulky and expensive 
permanent magnets, even in the limit of complete internal MLC ther- 
malization. Our EC capacitors should therefore permit MC prototype 
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(Supplementary Fig. 3b), white adiabatic contour separation is about 4.8K. 

d, Reversible adiabatic temperature change A7(T,,£) < O due tothe removal of 
field F at starting temperature 7,, as determined fromc. Data for MLC1. Selected 
P(E) plots in Supplementary Note 7. Selected constant-field cross-sections in 
Supplementary Note 8. Abscissae exclude both extremes of the 270-405 K 
measurement temperature range, for reliable evaluation of (0P/07),. 
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Fig. 3 | Direct EC measurements. a, Left inset shows temperature change AT, 
versus time twhena field of 15.8 V um “was switched on and off. Data were 
measured at the MLC face centre with the thermocouple, starting at 
temperature 7, ~ 315 K. The main panel shows a detail of this inset (black data), 
as wellas the corresponding infrared camera measurements for the 

300 um x 300 pm face centre (purple data). Right inset shows an infrared image 
of the MLC face at t= 55.1s, when the face centre (white square overlay) and 
surrounding area were coldest (dark blue). The black rectangular overlay 
outlines the active area. IR, infrared; TC, thermocouple. b, For different values 
of T,, we plot temperature jump magnitude |A7,| whena field of 15.8 V um" or 
29.0 Vm ‘was switched on and later off. Data were measured at the MLC face 
centre using the infrared camera or thermocouple. Black line shows 0.90|AT| 
versus 7, for 15.8 V um", identified using |A7| from the indirect method (Fig. 2d) 
after scaling by 0.90 to achieve a least-squares fit with field-off values of |A7,| 
(blue circles). All data for MLC1. Similar data for five similar MLCs appear in 
Supplementary Fig. 8. 


design principles’ to be repurposed for improved performance, without 
permanent magnets. 

MLC fabrication, characterization and measurementare described 
inthe Methods; the challenges of fabrication are discussed in Supple- 
mentary Note 3; and our direct EC measurement set-ups are shownin 
Supplementary Note 4. A plan-view optical image (Fig. 1a) and sche- 
matic (Fig. lb) accompany a cross-sectional schematic (Fig. 1c) show- 
ing six rather than all 21 layers of PST. The 19 active layers, of average 
thickness 37.9 pm, were electrically addressed by interdigitated inner 
electrodes of Pt (about 2 um thick), the active area was approximately 
49 mm’, and the active volume occupied 54% of the total MLC volume 
(10.45 mm x 7.43 mm x 0.84 mm). 

X-ray diffraction revealed a high degree of B-site cation order (order 
parameter S,,, ~ 0.96; see Supplementary Note 5). This represents a 
substantial improvement with respect to MLCs of PST (S,,, = 0.6-0.7), 
for which |A7] = 2.4 K in the active volume” was identified via a type 
of correction that can be prone to overestimation”. Zero-field calo- 
rimetry revealed a Curie temperature and latent heat (7. = 290 K and 
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|Qo| = 10.2 MJ m®, inset of Supplementary Fig. 3a) that differ only 
slightly from the corresponding values for bulk PST with similar B-site 
cation order” (7. = 298 K and |Qo| = 10.0 MJ m°, assuming our density 
of 8,750 g cm’). Dielectric measurements (Supplementary Fig. 3c) 
recorded a Curie temperature of 7.292 K, asmall loss tangent of 0.05 
just below 7,, and a much smaller loss tangent of <10* above 7,, where 
we will now describe large EC effects. 

We will first use the well-known indirect method? to evaluate EC 
effects for the active volume (Fig. 2), implementing three improve- 
ments (see Methods) that some of us introduced in ref. ”. Isothermal 
electrical polarization P(E) was measured every 0.35 K at 385 values of 
increasing temperature 7, and field-removal branches obtained under 
unipolar conditions (0 < £<15.8 V pm") were used to plot P(7,£) (Fig. 2a). 
Inflection points in these field-removal branches were used to identify 
the phase boundary E(7) and critical endpoint. Combining the gradi- 
ent dE/d7= 0.15 V um K7™ of this boundary near 7, witha field-driven 
polarization change of |AP| ~ 24.0 uC cm (Supplementary Note 9) 
implies, via the Clausius—Clapeyron equation |dF/d7| = |AS|/|API, an 
isothermal entropy change of |AS,| = 36 kJ K'm®%, in good agreement 
with the thermally driven entropy change of |AS,| = 35 kJ Km? (from 
|Qo| = 10.2 MJ m™@ and 7, = 290 K; inset of Supplementary Fig. 3a). 

Our dense P(7,£) data (Fig. 2a) were used to evaluate the reversible 
isothermal entropy change AS(T, F) =f? (OP/OT) p dE’ > O(Fig. 2b) for 
field removal (£ > O) at temperature 7, using the Maxwell relation 
(0S/0E),=(OP/0T),. The largest field used in our indirect measurements 
(E=15.8 V um) yields an entropy change whose magnitude reaches a 
peak of |AS| ~ 43 kJ K? m? near 300 K, while the corresponding 
refrigerant capacity is sPias(ryid T = 2.8 MJ m° (7, and T, define the 
FWHM of |AS(7)| for E = 15.8V pm”). Subtracting the entropy change 
on field removal AS(7,£) > O from the zero-field entropy 
S’(T,O) = S(T,0) — S(270 K,O) (Supplementary Fig. 3b) yields entropy 
map S’(7,£) (Fig. 2c; adiabatic contours are white). Following an adiaba- 
tic contour identifies the reversible adiabatic temperature change for 
the active regions, yielding A7(T,,E) < 0 (Fig. 2d) for the removal of field 
Fat starting temperature 7,. For £=15.8 V um”, the peak temperature 
change that we identify using the indirect method is |A7| = 4.8 K near 
309K. 

Thermocouple measurements of temperature change AT, versus 
time t recorded temperature jumps of |A7;| ~ 4.0 K at the MLC face 
centre, when driving a four-step Brayton cycle for which £~ 15.8 V ym 
and 7,~315K (left inset, Fig. 3a). This well-known cycle comprised (1) a 
highly adiabatic field-on temperature jump A7, > O, (2) a slow isofield 
return to 7,, (3) a highly adiabatic field-off temperature jump AT, <0 
and (4) aslow zero-field return to 7,. The absence of Joule heating, while 
dumping EC heat in step (2), is important for applications. 

Infrared camera measurements at the MLC face centre (300 pm x 
300 um) recorded a slightly larger value of |A7,| ~ 4.3 K (main panel, 
Fig. 3a), reflecting improved adiabaticity (owing to faster data acqui- 
sition by a factor of 11) and reduced thermal mass (approximately 
6-um-thick black paint with relatively low volumetric heat capacity 
versus the thermocouple affixed with a <0.05 mm? drop of black paint). 
The inactive layers in the active area render this value 10% smaller than 
the indirectly measured value of |A7]| = 4.8 K for the active volume 
(Fig. 2d), yielding |A7;| ~ 0.90|A7] as roughly expected (see Methods). 
By contrast, inactive thermal mass outside the active area only dimin- 
ished the magnitude of the temperature change near the periphery of 
the active area (right inset, Fig. 3a). 

Infrared camera measurements at different starting temperatures 
recorded similar values of |A7,| for field application/removal (red and 
blue data, Fig. 3b). The crossover in the magnitude of these field applica- 
tion/removal values (seen previously for Gd)”’ was also observed when 
varying field (Supplementary Note 10) and is explained in Supplemen- 
tary Note 11. For our intermediate field of £E~15.8 V um", values of |ATj| 
(red and blue circles, Fig. 3b) match well with values of 0.90/A7] from the 
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Fig. 4| Large EC effects over a wide range of operating temperatures. 

a, Effective EC temperature change |A7,.;-| as a function of starting temperature 
T,, for fields of E=15.8 V um and 29.0 V um “+. Upper bounds from highly 
adiabatic infrared measurements of |A7|| (field-off data, Fig. 3b) assume 
thermalization inthe active area alone before useful heat transfer. Lower 
bounds of 0.60|A7,| assume complete internal thermalization before useful 
heat transfer. Data are for MLC1. Similar data for five similar MLCs appear in 


indirect method (black line, Fig. 3b), and slightly exceed values of |A7;,| 
from thermocouple measurements (brown and black circles, Fig. 3b) 
due to the improved adiabaticity and reduced thermal mass that we 
explained earlier. For 7,near 330 K, our maximum field of E~29.0 Vm? 
yielded our highest value of |A7,| ~ 5.5 K (Fig. 3b). Similar results for five 
similar MLCs appear in Supplementary Fig. 8. 

Our directly measured value of |A7,| = 5.5 K exceeds the MC bench- 
mark of |A7] = 2.5 K for Gd in prototypes (Supplementary Note 1). 
Moreover, it represents an improvement over other macroscopic 
EC bodies at any starting temperature, including those for which 
temperature changes of |A7|| ~ 2.2 K have been directly measured 
near room temperature (Supplementary Note 2). For the active vol- 
ume alone, our maximum field of 29.0 V um" yields peak values of 
|AT| = |AT,|/0.90 = 6.1K near 330 K, |AS| = 53 kJ Km near 308 K (Sup- 
plementary Note 13), and |Q| = 7|AS| = 16.6 MJ m* near 330 K. The cor- 
responding electrical work |W| =2.5 MJ m? near 330 K (Supplementary 
Note 14) implies an isothermal materials efficiency |Q|/|W| = 7 that is 
similar to the values identified for other EC or elastocaloric materials, 
and slightly smaller than the values identified for MC materials driven 
by permanent magnets”. 

Our large EC effects are predicated on the large latent heat associ- 
ated with high B-site cation order”*”®, such that reducing the B-site 
cation order reduces the EC effects (Supplementary Note 15), as seen 
for bulk PST”. However, the entropy change (|ASo| = 35 kJ Km) that 
corresponds to our zero-field latent heat (|Qo| = 10.2 MJ m™’) accounts 
for only about 2/3 of our largest value of |AS|=53 kJ K'm °°, revealing that 
there are also substantial EC effects associated with the enhancement 
of polarization in the transformed phase (see |A7,| versus £, Supplemen- 
tary Note 10). Substantial caloric effects associated witha single phase 


Table 1 | Performance summary 


Variable MLC of PST Gd in prototypes 
Driving field E=15.8V um" E=29.0Vum" HoHapp = 1.4 T 
|ATo| 2.6-4.3 K 3.3-5.5 K 2.5K 

|AS| 43 kJ K'm® 53kJK'm® 21kJK'm® 


For an MLC of PST driven using electric field E, we present peak upper and lower bounds on 
effective temperature change |AT.| (Fig. 4a), and peak entropy change |AS| normalized by 
active volume (Supplementary Note 13). EC data are for MLC1. For comparison, we present 
the corresponding data near 291 K for a bed of commercial-grade Gd spheres driven with 
permanent magnets“. The internal magnetic field of LigH;, = 1.0 T corresponds to an applied 
magnetic field of igH.p, = 1.4 T (where Lp is the permeability of free space). 
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Supplementary Fig. 9. For comparison, we plot |A7,r.| versus 7, for abed of 
commercial-grade Gd spheres driven with permanent magnets’, where the 
internal magnetic field of 19H, ~ 1.0 T corresponds to an applied magnetic 
field of upHpp ~ 1.4 T (and where py is the permeability of free space). b, The 
range of 7, for whicha given value of |A7,, is exceeded, with upper and lower 
bounds shaded as for a. This description of peak width provides more 
information thana single value of refrigerant capacity. 


away from absolute zero are well known in mechanocaloric materials 
(near and away from phase transitions)” and in MC materials (near 
phase transitions)”, but not in EC materials. 

We will now consider the effective temperature change for applica- 
tions |A7,|, which lies between two bounds that define the shaded 
regions in Fig. 4a. The upper bound assumes that the active area 
thermalizes and then exchanges heat with its intended target before 
exchanging any heat with the inactive area, such that |A Tel = |A7;|. The 
lower bound assumes that the active volume completely thermalizes 
with the inactive volume, such that |A 7] = 0.54|A7] = 0.6|A7|| given 
an active volume of 54% for which |A7| = |A7;|/0.90. Upper and lower 
bounds for the five similar MLCs are presented in Supplementary Fig. 9. 

Let us now consider MLC performance with our maximum field 
(E= 29.0 V um) while bearing in mind that complete thermalization 
may be unduly pessimistic because the large, thin active area would 
make intimate contact with its intended target, for example heat- 
exchange fluid. Values of |A To,| peak at 5.5 K if we assume thermalization 
of the active area only (3.3 K for complete thermalization). Moreover, 
values of |AT7,,| remain large over a wide range of starting temperatures 
(Fig. 4b), exceeding 3 K for starting temperatures that span 176 K if 
we assume thermalization of the active area only (73 K for complete 
thermalization). Large temperature spans have been hitherto accessed 
through the broad transitions associated with smaller EC effects in 
relaxors’, but here we access a large temperature span by driving a 
first-order transition using supercritical fields (Fig. 2a), as verified for 
our MLCs by using Landau theory to model the active volume (Sup- 
plementary Note 16). 

Figure 4 and Table 1 compare |AT,,,_ with temperature jumps 
|A Tor] = |AT;| = [AT] for a bed of commercial-grade Gd spheres*. The 
bed was addressed by fields that could be achieved with permanent 
magnets*®, such that it represents the archetypal MC working body in 
prototype devices. Our peak upper bound of |ATz| = 5.5 K is roughly 
double the peak value of |AT,,.| = 2.5 K for Gd in prototypes, and our 
values of |A Tor fall considerably more slowly when the starting tem- 
perature is varied. Even if we conservatively assume both our lower 
field (E=15.8 V pm") and complete internal thermalization, our peak 
value of |ATz)| = 2.6 K is similar to the peak value for Gd. 

Insummary, we have demonstrated large EC effects in high-quality 
MLCs of PST. These effects remain large over a wide range of start- 
ing temperatures, such that they could be used for active or passive 
regeneration in which a large temperature span would be accessed 
with just a single type of MLC. If MLCs of PST were to replace Gd in MC 
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prototypes, then the impressive developments of recent decades”? 
could be exploited by using larger caloric effects over a wider range of 
temperatures. There would be no need for bulky and expensive per- 
manent magnets*°, and the dynamic field profile in an active regenera- 
tor’ could be tailored at will because the constituent MLCs would be 
individually addressable. Our operating temperatures are relevant 
for cooling consumer electronics and solar cells, and can be reduced 
below room temperature by using doped PST”?*’. MLCs of PST could 
be further developed by increasing the breakdown field through pro- 
cess optimization, reducing the inactive volume by automated mass 
production, increasing the active area, and tuning the number of layers 
to match thermal conductance with system design. 
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Methods 


Samples 

All data were obtained using MLC1, except when verifying reproduc- 
ibility (MLCs 2-6, Supplementary Note 12), evaluating electrical work 
(MLC7, Supplementary Note 14) and investigating the effect of reduced 
B-site cation order (MLC8, Supplementary Note 15). The X-ray diffrac- 
tion and heat capacity data were obtained after MLCs had been crushed 
to forma powder. The optical image in Fig. 1a was obtained using an 
MLC that was similar to MLC1. 


MLC fabrication 

MLCs of PST were prepared by solid-state reaction and tape casting. 
Stoichiometrically weighed powders of Pb,O,, Sc,O, and Ta,O; were 
ball-milled in distilled water for 17 h with balls of partially stabilized 
zirconia. The resulting slurry was dried and calcined at 850 °C for 4hto 
obtain PST powder. This powder was ball-milled for 24 hin an organic 
solvent with a binder, and the resulting slurry was used to form green 
sheets of PST with a 300-um-gap doctor blade. After screen-printing 
inner electrodes of Pt paste, we stacked, pressed and cut the green 
sheets to obtain green chips. Next, the binder was burned off at 500 °C 
for 4h. Proto-MLCs were then sintered at 1,400 °C for 4 hand sub- 
sequently annealed at 1,000 °C for 100-1,000 h, while surrounded 
with a mixture of Pb,O, and ZrO, powders whose Pb:Zr ratio was 1:1 
(the resulting PbZrO, reaction product prevented lead loss). However, 
MLC8 with reduced B-site cation order (Supplementary Note 15) did 
not undergo the anneal at 1,000 °C. Outer electrodes were fabricated 
with silver paste. The challenges of MLC fabrication are discussed in 
Supplementary Note 3. 


PST grain size and density 

An average grain size of 2-3 um was identified by scanning electron 
microscopy. We calculated an average value of density p = 8,750 kg m> 
by using Archimedes’ principle to identify p = (Wp ar0,)/(W — V) for three 
samples (5 mm x 5 mm x 0.3 mm) fabricated from thick green sheets 
without electrodes. Here, w denotes the sample weight in air, v denotes 
the apparent sample weight in purified water, and Pyare, is the density 
of water. 


Temperature control for isothermal measurements of dielectric 
constant and electrical polarization 

We used a bespoke cryogenic probe™ with which each MLC made good 
thermal contact. To establish an appropriate timescale for isothermal 
measurements, the thermal relaxation time (approximately 2.2 s for 
1/e decay) was identified at room temperature from infrared measure- 
ments of quasi-adiabatic temperature change in air (not shown). The 
use of vacuum had nominally no effect on thermal relaxation time. 


Temperature control for highly adiabatic EC measurements 
with the thermocouple and infrared camera 

The MLC was suspended approximately 0.9 mm above the heater block 
of abespoke heating stage*‘ that was open to air (Supplementary Fig. 1). 
The thermal relaxation time (about 7 s for 1/e decay) was identified 
from thermocouple measurements of temperature change (left inset 
of Fig. 3a). 


Dielectric measurements 

These were performed with an Agilent 4294A analyser, by sweeping the 
frequency from100 Hzto 100 kHz. Data were collected approximately 
every 0.12 K while cooling and subsequently heating at the slow rate of 
+1K min‘in our cryogenic probe™. 


X-ray diffraction measurements 
These were performed with Cu-Ka radiation using a Bruker D8 Advance 
diffractometer equipped witha LYNXEYE EX detector. The intensities 


of the 111 and 200 reflections were determined by fitting a pseudo-Voigt 
function using HighScore Plus software. 


Differential scanning calorimetry 

This was performed using a TA Instruments Q2000 DSC that was 
calibrated via the melting transition of an indium reference sample. 
Data were obtained on increasing the temperature at 10 K min’. Fol- 
lowing standard practice*, heat flux dQ/dt was normalized by tem- 
perature ramp rate d7/d¢to yield dQ/dT. Integration of dQ/d7 yielded 
latent heat |Q,| = 10.2 MJ m® after subtracting a sigmoidal baseline 
(inset, Supplementary Fig. 3a). Calibration with a sapphire reference 
permitted dQ/d7T to be recast as zero-field heat capacity c(7) (Supple- 
mentary Fig. 3a). From c(7), we obtained the zero-field entropy 


S(T) =S(T)-S(270K)=f"_,, c(T)/T’AT’ (Supplementary Fig. 3b) with 


respect to the entropy S at 270 K. 


Indirect EC measurements 

Using a Keithley 2410 SourceMeter, highly isothermal measurements 
of electrical polarization were obtained at constant current (10 pA 
above T,, 5 A below 7,) on warming from 270 K to 405 K in our cryo- 
genic probe, such that approximately every 0.35 K we measured 
one bipolar cycle in +600 V, two unipolar cycles out to +600 V, and 
two unipolar cycles out to -600 V. Bipolar cycles were used to centre 
unipolar cycles on the polarization axis, and P(7,£) data for indirect 
EC measurements were harvested from the second positive unipolar 
cycle (examples of bipolar and unipolar cycles appear in Supplemen- 
tary Fig. 4). The small constant current limited the instantaneous 
speed of the field-driven transition to yield approximately isother- 
mal conditions (the duration of = 20 s for each cycle branch was >9 
times as large as the thermal relaxation time of approximately 2.2 s 
for 1/e decay). 


Improvements to the indirect method 

Three improvements that some of us introduced elsewhere” permit 
excellent agreement between our indirect EC measurements and our 
direct measurements (comparisons appear in Fig. 3b and Supplemen- 
tary Fig. 7). First, P(E) was measured approximately every 0.35 K, in 
contrast with the standard practice of using 10 K increments, result- 
ing ina dense map of P(7,£) (Fig. 2a) that we used to construct a dense 
map of AS(T,F) (Fig. 2b). Second, we used unipolar not bipolar P(F) 
measurements (Supplementary Note 7), thus minimizing field hys- 
teresis to strengthen the single-valued assumption on P(7,F). Third, 
we evaluated A7(T,,E) (Fig. 2d) by following adiabatic contours onthe 
entropy map S’ (7,£) (Fig. 2c) that we created by subtracting AS(7,F) 
(Fig. 2b) from S’ (7,0) (Supplementary Fig. 3b), thus improving onthe 
standard practice of identifying A7~—TAS/c under the assumption of 
some single effective value for the specific heat capacity c. 


Direct thermocouple measurements of temperature change 

The measurement set-up appears in Supplementary Fig. 1a. A bespoke 
K-type thermocouple was monitored at approximately 4.5 Hzto record 
EC cycles roughly every 5 K on warming from 295 K to 388 K. These 
cycles were driven using acurrent of magnitude 10 mA froma Keithley 
2410 SourceMeter. The insulation of the 40-tm-diameter thermo- 
couple wires was removed within about 2 cm of the weld to reduce 
thermal mass. The weld was pressed onto the centre of the MLC face 
and attached with a drop of matt black paint (PNM400, Electrolube) 
for good thermal contact. To prepare the MLC for subsequent infrared 
measurements, the thermocouple and black paint were removed from 
the MLC by using acetone and then isopropyl alcohol. 


Direct infrared measurements of temperature change 
The measurement set-up appears in Supplementary Fig. 1b. We used 
an infrared camera (SC7500, FLIR) operating at 50 Hz to image EC 
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cycles approximately every 5-10 K on warming in air. These cycles 
were driven using a current of magnitude 1-10 mA froma Keithley 2410 
SourceMeter. To increase emissivity, two layers of matt black paint 
(PNM400, Electrolube) were spin-coated onthe MLC face, resulting in 
atotal thickness of approximately 6 pm. Allinfrared data represent an 
average within the 300 um x 300 ym face centre (white square, Fig. 1b), 
with the exception of the infrared image that we present in the right 
inset of Fig. 3a. Supplementary Note 17 explains how we achieved good 
calibration across starting temperatures that spanned 188 K, and how 
we identified an emissivity of 0.84-0.87 for the black paint. 


Inactive thermal mass in the active area 

Inside and not near the periphery of the active area, complete thermali- 
zation between the 19 active layers of PST and the inactive layers (two 
outer layers of PST, 20 inner electrodes of Pt) implies |A7,| ~ 0.86|A7|, 
assuming an off-peak specific heat capacity of c= 2.7 MJ K'm™? for PST 
(Supplementary Fig. 3a), and c= 2.8 MJ K'm® for Pt. The prediction 
of |AT7;| = 0.86/A7] is similar to our empirical finding of |A7,| ~0.90|A7]. 


Data availability 


Source data for Figs. 2-4 are provided with the paper. All other relevant 
data are available within the paper and its Supplementary Information 
files. 
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Traditional technologies for virtual reality (VR) and augmented reality (AR) create 
human experiences through visual and auditory stimuli that replicate sensations 
associated with the physical world. The most widespread VR and AR systems use head- 
mounted displays, accelerometers and loudspeakers as the basis for three- 
dimensional, computer-generated environments that can exist in isolation or as 
overlays on actual scenery. In comparison to the eyes and the ears, the skinisa 
relatively underexplored sensory interface for VR and AR technology that could, 
nevertheless, greatly enhance experiences at a qualitative level, with direct relevance 
in areas suchas communications, entertainment and medicine’”. Here we present a 
wireless, battery-free platform of electronic systems and haptic (that is, touch-based) 
interfaces capable of softly laminating onto the curved surfaces of the skin to 
communicate information via spatio-temporally programmable patterns of localized 
mechanical vibrations. We describe the materials, device structures, power delivery 
strategies and communication schemes that serve as the foundations for such 
platforms. The resulting technology creates many opportunities for use where the skin 
provides an electronically programmable communication and sensory input channel 
to the body, as demonstrated through applications in social media and personal 
engagement, prosthetic control and feedback, and gaming and entertainment. 


Animportant future for VR/AR lies inthe development of a full, immer- 
sive experience that includes not only interactive images and sounds, 
but also sensations of touch. The consequences of technologies with 
multi-sensory capabilities of this type will be far reaching, across fields 
ranging from social media and communications, to gaming and enter- 
tainment, and to clinical medicine, rehabilitation and recovery’”. The 
skin is the largest organ of the body, and mechanoreceptors distrib- 
uted across the skin, within the dermis, form the basis of our physical 
interactions with the world. Specifically, responses to spatio-temporal 
patterns of force onthe skin transmit to the brain as signals that define 
a mechanical sense of our surroundings? ». Efforts to integrate elec- 
tronically programmable interfaces to mechanoreceptors within a 
comprehensive VR/AR platform are, however, in their infancy compared 


to those associated with video and audio interfaces. Some approaches 
rely oncollections of wired electrodes pressed against the skin to induce 
artificial, vibration-like sensations via electrostimulation, known as 
electrotactile effects*®”. Variability of the impedance of the skin across 
the body and between individuals, along with time dependent drifts 
in this quantity due to changes in the properties of the skin or the 
electrode surfaces, represent confounding challenges in selecting 
appropriate combinations of voltages and currents that create desired 
responses without pain or electrically induced lesions®. A promising 
alternative relies on mechanical forces, in the form of vibratory actua- 
tion imparted to the skin by electrical motors or piezoelectric devices, 
where relays, bulk wires and battery packs couple loosely to the body 
through textiles, tapes and straps to provide the necessary control 
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Fig. 1| Design and architecture of an epidermal VR system. a, Exploded-view 
schematic illustration of a device with 32 independently controlled haptic 
actuators. b, Schematic illustration of the NFC electronics and circuit; the main 
circuit components are labelled 1-6.c,d, Optical images of an NFC coil before 
(c) and after (d) integrating the electronic components. e, Exploded-view 
schematic diagram of a haptic actuator. f, g, Schematic diagram of an actuator 


systems and power supplies””. As with related supporting hardware for 
electrotactile interfaces, the cumbersome nature of this type of tech- 
nology and the limited ability to scale to monolithic, manufacturable 
platforms with large numbers of independently controlled actuators 
represent disadvantages that will hinder widespread adoption. 

Here we introduce a set of materials, device designs, integration 
schemes and system layouts for wirelessly controlled and wirelessly 
powered, battery-free, hapticinterfaces that incorporate large arrays 
of millimetre-scale vibratory actuators in soft, conformal sheets of 
electronics that laminate directly onto the skin in asimple, non-invasive 
and reversible manner. Multiple systems of this type, interfaced onto 
desired locations on the body with full, programmable control via a 
remote computer system, establish means to extend VR/AR experiences 
beyond visual and auditory sensations, with broad application possi- 
bilities. Figure 1a presents schematic illustrations of a representative 
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viewed from above (f) and below (g).h, i, Optical images of an actuator viewed 
from above (h) and below (i).j-o, Optical images (top row) and FEA results 
(bottom row) of an epidermal VR device under bending (j, m), folding (k,n) and 
twisting (I, o). The colour inm-o represents the equivalent strain, and the 
insets show the areas with relatively high strain levels. See Methods for details. 


platform, which we refer to as an epidermal VR interface. The con- 
struction takes the form of a multilayer stack that includes (1) a thin 
elastomeric layer as a reversible, soft, adhesive interface to the skin, 
(2) a silicone-encapsulated functional layer that supports a wireless 
control system, ameans for receiving wirelessly transmitted power, and 
an interconnected array of actuators with associated drive electron- 
ics, and (3) abreathable, stretchable fabric coated with an thin film of 
silicone, as a physically tough but skin-conformal supporting substrate 
with strain-limiting mechanics to prevent damage to the functional 
materials and components. For aesthetics, a coating of silicone that 
incorporates skin-tone colouration and/or graphics can be included on 
the outward facing side of the fabric, or such features can be incorpo- 
rated directly into the fabric itself. The electronics part of the functional 
layer consists of a collection of copper (Cu) traces encapsulated in poly- 
imide (PI) and formed in narrow, filamentary serpentine geometries 
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Fig. 2| Optimized operation of key electrical and mechanical components of 
an epidermal VRsystem. a, Normalized amplitude-frequency response of a 
haptic actuator in contact witha skin phantom. b, Travel amplitude of the 
magnet asa function of the input power (data points) for an actuator in contact 
with skin phantoms with elastic moduli of 130 kPa. Here, andinc, the symbols 
and lines correspond to experimental and mechanics-simulated theoretical 
results, respectively. c, Dependence of the resonance frequency onthe 

elastic modulus of the skin phantom (data points), over a range relevant for 
human skin. d, Schematic illustration of an epidermal VR device withan 


according to quantitative design rules in stretchable electronics’”. 
These traces interconnect a collection of small, chip-scale integrated 
circuit components and passive elements, including magnetic radio 
frequency (RF) loop antenna structures, resistors, capacitors, rectifiers 
and integrated circuit (IC) switches. System-on-a-chip (SoC) ICs that 
include microcontrollers with capabilities in near-field communication 
(NFC) and general input/output functionality (Fig. lb-d) serve as con- 
trol interfaces to a distributed set of mechanical vibratory actuators, 
referred to in the following as haptic actuators. 

Supplementary Fig. le-i shows schematic diagrams and opti- 
cal images of these actuators. Here, time-dependent Lorenz forces 
(Extended Data Fig. 1) follow from the passage of a time-varying current 
through a coil that surrounds a permanent magnet. The shell of the 
actuator consists ofa ring-shaped elastomeric structure that provides 
space for the magnet to travel freely in the out-of-plane direction. A 
thin disk of PI mounted on top of the PDMS ring and laser-cut with 
asemicircular slit serves as a bonding location for the magnet. This 
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intermediate coil above atransmission antenna. e, Power harvested fromthe 
primary coil of an epidermal VR device with and without an integrated 
intermediate (‘inter.’) coil, as a function of distance in the Zdirection. Here, and 
inf, the lines are guides to the eye. f, Power harvested fromthe primary coil of 
an epidermal VR device with an intermediate coil after power regulation, for 
various RF powers applied to the transmission antenna (4-12 W; see key). 

g,h, Schematic illustration of arepresentative epidermal VRsystem mounted 
onthe body (g), and the SAR distribution (h). Inb,c,e, f, error bars correspond 
to the calculated standard deviation. 


construct forms a cantilever-like platform, capable of actuation via 
interactions between the magnet and current flowing through the 
coil at the base of the ring. These basic designs can, in principle, be 
extended to length scales that characterize the separation of mecha- 
noreceptors inthe skin of the arms, chest, back and legs (Supplemen- 
tary Fig. 2). Moreover, miniaturization of this type of actuator by an 
order of magnitude increases the acceleration of the magnet during 
vibration by more than a factor of three for the same electrical power, 
such that the same contact pressure can be achieved by reducing the 
radial and thickness dimensions of the magnet by a factor of 10 and 
3, respectively. As a result, further decreases in power consumption 
might be achievable by reducing the sizes of the actuators. (See details 
of scaling simulations in Methods.) 

Careful optimization of the materials and designs of these actuators, 
guided by computational modelling, allows for power efficient opera- 
tion as skin-coupled haptic interfaces. The diameter and thickness of 
the PI disk and the layout of the slit, the geometry of the PDMS ring, 
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Fig. 3 | Wireless control strategies for epidermal VR systems. a, b, Circuit 
diagram for an epidermal VR device, including a large primary coil for power 
harvesting (a) and several control modules (b), each of which consists of a small 
antennaandan SoC, along with eight haptic actuators controlled by eight IC 
switches, independently. GND, ground; A, actuator; S, IC switch; C, SoC.c, 
Schematic diagram (left) and working principle (right) of anIC switch. The ON/ 
OFF of the output voltage of the IC switch is controlled by GP I/O ports onthe 
SoC. d, Diagram of the command interface that supports independent control 
over every actuator inthe system. Each of the eight GP I/O ports in each SoC is 


the type and size of the magnet, and the configuration of the coil can 
be selected to satisfy requirements relevant to body-interfaced opera- 
tion, across many mounting locations and body types, as outlined in 
quantitative detail in Extended Data Figs. 1-4. Frequencies inthe range 
between 100 Hz and 300 Hzare of greatest interest because they pro- 
vide the strongest sensations on human skin, owing to the intrinsic 
nature of the responses of the mechanoreceptors”””. Here, amplitudes 
as small as several micrometres can yield distinct tactile responses”. 
Adjusting the angular extent of the slit in the PI disk presents asimple 
means to tune the resonant frequency of the actuator (Extended Data 
Figs. 2, 3) to a value of 200 Hz for operation ona skin phantom with 
an elastic modulus of 130 kPa. With optimized designs, these types of 
haptic actuators require only about 1.75 mW to induce a notable sensory 
responses on the fingertips and hands, with a corresponding amplitude 
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defined by aone-byte command, such that 8 x n actuators can be initiated in 
any form by a portfolio of n bytes. The dashed coloured boxes represent 
different SoCs. e, Response time of actuators controlled by four SoCs. The 
coloured plots represent different output signals of the actuators. f,A 
magnified view of the time required to switch from one actuator to another. 
g, Software interface of the control system. h, Three representative working 
frequencies for haptic actuators in an epidermal VR platform: 100 Hz, 200 Hz 
and 300 Hz. 


of approximately 35 pm (Supplementary Videos 1, 2; Fig. 2b), without 
parasitic heating effects (Supplementary Fig. 3) and with the option of 
full system operation with small batteries. By comparison, widely used 
commercially available vibration actuators/motors (eccentric rotating 
mass actuators, linear resonant actuators and piezoelectric actuators) 
in consumer gadgets typically require >100 mW. Figure 2c and Extended 
Data Figs. 3, 4 summarize the dependence of the resonance frequency 
on the modulus of the phantom, from 60 kPa to 200 kPa. The results 
suggest a weak, almost negligible, variation in frequency over modu- 
lus values that span those characteristic of skin at different ages and 
across different regions of the body. The amplitude increases linearly 
with input power for all modulus values (Extended Data Figs. 3e, 4e). 
The motion of the actuator involves a vibrational deflection along the 
cantilever beam ina way that directly, and indirectly through inertial 
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Fig. 4| Examples of applications of epidermal VR systems. A, Social media 
application:a, a girl touches a screen that displays a video feed of her 
grandmother, whois wearing an epidermal VR device on her hand and her arm 
(inset photograph); b, adynamic illustration of the pattern of ‘virtual touch 
process’ and ‘sense of virtual touch’. B, Prosthetics application: a,amanwitha 
lower-arm amputation wears a prosthetic arm witha robotic hand and an 
epidermal VR device on his upper arm; b,c, the device produces a haptic 


effects, couples to the skin. Careful optimization minimizes mechani- 
cal cross-talk between adjacent actuators (Extended Data Figs. 4f, 5). 

These actuators connect to associated antenna structures and 
electronic components through conductive traces with designs that 
minimize strains and resistive losses and, at the same time, operate 
without failure under a full range of bending and twisting motions. As 
with actuator design, 3D finite-element analysis (FEA) techniques guide 
selection of the interconnect geometries and the overall system layouts 
(see Methods for details). The result is a soft, deformable platform 
capable of establishing a comfortable, non-irritating interface to the 
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Sense of virtual touch 
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pattern of sensation (‘think and feel’) that reproduces the shape characteristics 
of objects (‘feedback’) held in the robotic hand (‘sensing’). C, Gaming 
application: a,aman wears several epidermal VR devices on different parts of 
his body; b-f, devices activate when a strike occurs on the corresponding body 
part of the game character, namely, the hand (b), elbow (c), arm (d), chest (e) 
and back (f). 


skin, across nearly any region of the body” ». Computed distributions 
of strain in the copper and corresponding optical images in Fig. 1j-o 
showresults for bending, folding and twisting. The equivalent strains 
remain below the elastic limit (0.3%) for a bend of about 145° (witha 
bending radius of approximately 5.1cm), a fold of about 150° (folding 
radius approximately 5 cm) and a twist of about 50° (ref. "°). (Fig. 11 
shows a ‘bend, Fig. 1k shows a ‘fold’.) The overall shapes of the plat- 
forms can be designed to harmonize with anatomical features; exam- 
ples include ‘flower’, ‘oval’, ‘peanut’, ‘triangle’ and ‘butterfly’ shapes 
(Supplementary Fig. 2). 
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NFC protocols serve as the basis for operation and coordinated con- 
trol, with modes that are difficult or impossible to reproduce with 
battery-free far-field techniques”. Power delivery and data communica- 
tion use different antennas, designed to allowindependent operation 
without interference (Extended Data Fig. 6a, b)"*. A large primary coil 
that exploits the full perimeter of the device platform harvests the 
power needed to operate the entire collection of actuators. Separate 
small antennas serve as wireless interfaces and power sources for each 
of the SoCs, each one of which supports independent control over eight 
actuators. Means for efficient harvesting and utilization of power are 
critical aspects of system design'*””. Most envisioned applications 
require wireless power transfer over distances of at least tens of cen- 
timetres, to allow integration of the RF power transmission antenna 
into the base of achair, desk or bed”. As summarized in Extended Data 
Fig. 6c-i, the operating distance (Z) ofa VR device (using a12cm*12cm 
square serpentine coil as the primary coil witha Q factor of 20, and 32 
actuators all working at a set power of 1.75 mW) oriented parallel to 
the antenna is about 30 cm and 45 cm for transmission antennas with 
respective dimensions of 31.8 cm x 33.8 cm and 85.2 cm x 62.0 cm, 
and an input power of 12 W to the transmission antenna. The distance 
Zcan be increased by increasing the power, optimizing the size of the 
transmission antenna and/or by decreasing the power consumption 
of the system by reducing the sizes of the actuators. 

Addition of an intermediate, single-loop coil (20 x 20 cm; tuned 
to 13.56 MHz) incorporated directly into the device (Fig. 2d, Supple- 
mentary Fig. 4) further improves this range by locally increasing the 
strength of the magnetic field by a factor of about 10 (Extended Data 
Fig. 7). Figure 2d compares the power harvested by the primary coil 
as a function of position Z for cases with and without the intermedi- 
ate coil, using a transmission antenna (85.2 cm x 62.0 cm) powered 
at 12 W. The intermediate coil increases the magnetic field strength 
by about 16 times for Z=30 cm, and by about 11 times for Z=50 cm. 
The corresponding increases in received power are approximately 2.5 
times for Z=30 cm, and 4.5 times for Z=50 cm, influenced by the load 
(resistance) of the device”°. The working range can reach 80 cm. An 
active power regulation system enabled by a linear voltage regulator 
ensures consistent operation throughout this full range of distances. 
Figure 2f highlights the ability to deliver a fixed output power from 
the primary coil of the device into a load resistor (2 kQ) for operation 
of the transmission antenna at powers between 4 W and 12 W acrossa 
range of distances. The power output is stable for tilt angles up to 60° 
and after more than 10,000 cycles of bending to a radius of 2.8 cm 
(Extended Data Fig. 8, Supplementary Fig. 5). As shown in Fig. 2g, hand 
Supplementary Fig. 4, the mode of operation complies with guidelines 
outlined by the Federal Communications Commission (47 CFR Part 
1.1310 and 15) and the Federal Drug Administration in terms of both 
the specific absorbed radiation (SAR) and the maximum permissible 
exposure (MPE). The maximum value of the SAR is 0.006 Wkg™ (Fig. 2h), 
substantially less than the exposure limit” of 0.08 W kg. The maxi- 
mum computed equivalent power density of electromagnetic fields 
is 0.8 mW cm” (Supplementary Fig. 6), whichis below the MPE limit” 
of approximately 4.9 mW cm”. 

The block diagram in Fig. 3a, b summarizes the system architecture 
and overall operation. The first part harvests power througha primary 
coil (Fig. 3a) via the transmission antenna. This power passes througha 
linear voltage regulator to provide a fixed, direct-current voltage (Vcc, 
where CC indicates common collector) to all of the haptic actuators. 
The second part provides control and communication via an array 
of interconnected SoCs, each with a separate small control antenna, 
and with operational control over eight haptic actuators through its 
general purpose input/output (GP I/O) ports (Fig. 3b). These ports 
generate square-wave signals by alternating the output of each GP 
I/O between its high and low settings, at programmable frequencies 
between 100 Hzand 300 Hz. AnIC switch associated with each actuator 
(Fig. 3c) transforms V,, into a square wave defined by the corresponding 
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GP I/O, and this signal serves as input to the actuator. In this way, one 
SoC/IC switch combination allows for control/operation of eight actua- 
tors, simultaneously and independently. Scaling this architecture to 
include multiple unit cells of this type yields systems with arbitrarily 
large numbers of actuators, without limitation. 

Wirelessly writing the necessary NFC Data Exchange Format (NDEF) 
messages into each SoC via the transmission antenna, which serves 
simultaneously as an RF reader, defines the output frequency to each 
haptic actuator, thus programming the entire system for operation. 
Unlike conventional multiplexing approaches to controlling actuators, 
acomputer interfaced to the reader collects information on the iden- 
tification codes for each of the SoCs in the system, thereby identifying 
every GP I/O port and haptic actuator. A one-byte message sets the ON/ 
OFF command for each of the eight GP I/O ports on each SoC. All GP 
I/O ports are controlled independently. An entire system with 32 haptic 
actuators (4 SoCs, each with 8 GP I/Os) can be controlled ina single 
communication of four bytes—for example, acommand of ‘FF FF FF FF’ 
sets all of the actuators to ON (Fig. 3d), and acommand ‘0100 00 00’ 
sets to ON the actuator connected to the first GP I/O of the first SoC. 

The time required to change from one system configuration to 
another is in the millisecond regime, as shown in Fig. 3e, f, whichis about 
50 times faster than the reaction time to tactile stimulation”. A graphi- 
calinterface with a touch screen, as in Fig. 3g and Supplementary Fig. 7, 
allows a user to change patterns of actuation rapidly and, separately, 
to select the amplitudes and frequencies of the vibratory responses. 
An array of 32 lasers provides a means to visualize operation at the 
system level, via projection of individual beams reflected from each 
of the haptic actuators and onto a monitoring screen (Extended Data 
Figs. 9,10 and Supplementary Video 3). Recordings from a high-speed 
camera replayed at 16.7x and 133x slow motion reveal the time dynam- 
ics of the vibratory motions of the actuators and their programmed 
control (Supplementary Video 4). 

The transmission antennas and devices can be configured in various 
ways for different use scenarios (Supplementary Fig. 8). Replacing the 
IC switches with high-power compliance transistors and modifying the 
cantilever designs improves the power delivery to the actuators and 
increases their vibratory bandwidths and amplitudes, for enhanced 
sensation across different body types and anatomical locations.A sin- 
gle-stage voltage regulation scheme conditions the power-harvesting 
efficiencies of receiving antennas with different form factors. Figure 4A 
summarizes a possibility in virtual interactions via social media. Here, 
a girl virtually touches her grandmother’s hand through an interface 
on the screen of a laptop that simultaneously displays a video/audio 
feed. In this example, two epidermal VR devices are mounted on the 
grandmother, who experiences a haptic sensation in the form of acon- 
tinuous wave of vibratory excitation extending sequentially down 
from her forearm to her hand ina spatio-temporal pattern of touchto 
match that of the granddaughter’s fingertips on the image onthe touch 
screen. A second representative application is in tactile feedback for 
use of robotic prosthetic devices. Figure 4B shows a man, whose lower 
arm has been amputated, with an epidermal VR device on his residual 
limb as he uses a prosthetic arm to grasp objects. Here, sensors onthe 
prosthetic detect the shape of the object and this information serves 
as input to create a virtual haptic representation of the shape on his 
upper arm. The third application is in haptic engagement in gaming. 
Inthe example of Fig. 4C, a gamer wears several epidermal VR devices 
across different locations of the body. Asa strike occurs in this combat 
game, haptic actuation reproduces the pattern of the impact at a cor- 
responding location. 

The epidermal VR systems introduced here exploit thin, soft archi- 
tectures capable of laminating directly onto the skin as a platform for 
programmable control of large arrays of miniaturized haptic actua- 
tors in wireless modes of operation and with lightweight, battery-free 
designs. This class of technology is qualitatively distinguished in form 
and function over previous attempts at programmable haptic interfaces 


to the body. Comprehensive experimental and computational stud- 
ies of the various subsystems in these platforms yield a basic under- 
standing of their operation and a set of guidelines for design choices. 
Demonstrations in social media interactions, prosthetic feedback and 
video gaming are representative of a broad spectrum of potential appli- 
cations, which also include systems for personalized rehabilitation, 
surgical training, educational feedback, and multimedia entertainment 
experiences. Many opportunities exist to improve the performance of 
these systems by increasing the strength of mechanical actuation at 
the skin interface. 
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Methods 


Fabrication of the system of electronics 

Asheet of polyimide (PI, 12.5 pm) coated with a thin layer of copper (Cu, 
50-200 pm wide and 18 pm thick) served as the substrate for the antenna 
structures and the electrical interconnects. Photolithography and etching 
yielded patterns of Cuin the desired geometries. For systems smaller than 
4 inches x 4 inches, the process used a positive photoresist (AZ P4620, 
AZ Electronic Materials) spin-cast at 3,000 r.p.m. for 30 s, soft baked on 
ahot plate at 110 °C for 4 min, exposed to ultraviolet (UV, wavelength 
350-400 nm) light to a dose of 500 mJ cm”, and developed for -70 sina 
basic solution (AZ 400K/deionized (DI) water in a1:3 volume ratio). For 
systems larger than 4 inches x 4 inches, the process used dry film photore- 
sist (Dupont, 38 pm thick), bonded onto the Cu foil by arroll laminator at 
110 °C, soft baked on a hot plate at 110 °C for 4 min, exposed to UV light 
toa dose of 500 mJ cm”, post baked ona hot plate at 110 °C for2minand 
developed for ~180 s ina basic solution (MIF 917/DI water in a1:1 volume 
ratio), and wet etched (CE-100 copper etchant, Transene) for ~2 min with 
frequent rinsing with DI water. In both cases, the photoresist was removed 
by acetone and the substrates were then rinsed with DI water. 


Fabrication of the haptic actuators 

The first step involved placing a Cu coil (wire diameter of 50 um, with 
300 turns to form a coil with an inner diameter of 3 mm and an outer 
diameter of 14 mm (Yisu Electronics, Inc.)) in the centre of an acrylic 
mould (mould 1) witha silicone release reagent (Clearco Product Co., 
Inc.). Pouring a prepolymer to poly(dimethylsiloxane) (PDMS; Sylgard 
184, Dow-Corning; 10:1 weight ratio of prepolymer to crosslinker) into 
mould 1submerged the Cu coil under alayer of PDMS with modulus of 
~1MPaand thickness of 0.2 mm. Baking inan ovenat 70 °C for 1h cured 
the material intoa solid, elastomeric form. Next, filling with additional 
PDMS prepolymer, and mounting ina second, matching mould (mould 
2) held in place with set screws prepared the assembly for asecond cur- 
ing step (overnight in an oven at 70 °C) to seal the coil structure in PDMS 
(~1 MPa; inner and outer diameters of 18 mm and 12 mm, respectively; 
thickness of 2.5 mm), shaped to meet the design requirements. The 
dimensions of the resulting PDMS ring with coil in its base, formed by 
release from moulds 1 and 2, were: 2.4 mm in thickness and 18 mm in 
diameter, with an inner cavity of 2.2 mm depth and 12 mm diameter 
(Extended Data Fig. 2). The second part of the actuator consisted of a 
permanent magnet (nickel-plated neodymium magnet, diameter of 
8mm, thickness of 1.6 mm) mounted ona PI disk. Laser cutting formed 
circular shapes with diameters of 18 mm from sheets of PI (125 pm thick, 
DuPont) and semicircular slits with diameters of 8 mm and central 
angles of 217° (Extended Data Fig. 2). A strong double-sided adhesive 
(Kapton, DuPont) attached a disk-shaped magnet (nickel-coated neo- 
dymium magnet, diameter of 8 mm, thickness of 1.6 mm, weight 0.6g, 
Bunting Magnetics Co.) on the cantilever part of the PI disk (18 mm 
diameter, 125 xm thick). The final step involved bonding the PI disk, 
with the magnet mounted on the back side, on top of the PDMS ring 
with a silicone adhesive (Kwik-Sil, WPI Inc.). Each completed actuator 
weighs 1.4 g. The frequency of the current input to the coil defined the 
frequency of vibration of the magnet. The magnitude of the current 
was controlled at a set point (~5 mA for experiments reported here). 


Scaling simulation of the haptic actuators 

Careful selection and optimization of various parameters of the actua- 
tors andthe entire VR system focused partly on generating a sufficiently 
strong alternating magnetic field with the Cu coil to excite suitable 
vibrations of the magnet. According to the magnetic field distribution 
around the Cu coil”, the acceleration a of the magnet can be expressed as 


: J PinpurSNn Derrei 


(Derek + a’) 


where Pinoy, is the input power, S is the cross-sectional area of the cop- 
per wire, Nis the layer number, nis the turn number of each layer, Douter 
is the outer diameter of the Cu coil, and dis the distance between the 
Cu coil and the magnet (Extended Data Fig. 2). Further, the contact 
pressure p can be expressed as 


p=aph 


where pand hare the mass density and the thickness of the cylindrical 
magnet, respectively. The pressure distribution on the skin associ- 
ated with operation of the actuator can be calculated by FEA, as in 
Supplementary Fig. 9. These results suggest that the force fapplied 
onthe skin for an actuator with input power of 1.75 mW is 135 mN, 
from f= J,pds. To examine the effects of miniaturization, we fix 
Pinpurr N and S, and reduce Dourer and d by 10 times. An example of 
miniaturization involves decreasing n by 10 times with S fixed. 
As aresult, the acceleration of the magnet increases by ~3.6 times 
for the same power input, such that the same contact pressure 
can be achieved by reducing the radial and thickness dimensions 
of the magnet by a factor of 10 and 3, respectively. The size of other 
components in the actuator can be adjusted according to the size of 
the Cu coil. 


Device integration and assembly 

Low temperature solder joints bonded and electrically connected all 
of the components, including the SoCs (RF430FRL15xH NFC, Texas 
Instruments), jumper wires (resistors, 0 O, Stackpole Electronics, 
Inc.), capacitors (10 nF to 2.2 pF, Murata Electronics North America), 
diodes (SMP1345, Skyworks Solutions, Inc.), power regulator (L78L, 
STMicroelectronics), IC switches (74LVC1G384, Nexperia BV), jump 
wires and haptic actuators to corresponding contact pads on the Cu/ 
Plsubstrate. A thin (-O.2 mm) coating of an ultra-low-modulus silicone 
material (0.1 mm Silicone C, Silbione, ~3.0 kPa, IIkem Silicones) ona 
stretchable fabric substrate (Spandex) served as an adhesive between 
the cloth and the electronic/haptic platform. Casting and curing alow 
modulus formulation of PDMS (Silicone B, PDMS, ~60 kPa, Sylgard 184, 
Dow-Corning) formed a uniform encapsulation layer (2.5 mm thick) 
over the electrical connections. A layer of skin-coloured PDMS (Sili- 
cone A, PDMS, ~0.2 mm, ~60 kPa, Sylgard 184, Dow-Corning) aligned 
and bonded ontop of the device acted as the top encapsulation layer. 
These soft silicone coatings provided reversible adhesion to the skin, 
with an adhesion energy of -90 N m™ for hairless areas and -80 Nm? 
for hairy areas (Supplementary Fig. 10). Thin PI (12.5 pm) layers with 
the same dimensions as the cantilever beams in the haptic actuators 
were aligned and placed on top of the actuators before covering the top 
encapsulation layer, enabling the magnets to have freedom for vibra- 
tory motions. The total weight is 130 g for a system with 32 actuators 
ina square array, 40 g for the flower shaped device, 38 g for the oval 
shaped device, 81 g for the peanut shaped device, 99 g for the triangle 
shaped device and 120 g for the butterfly shaped device. Supplementary 
Fig. 11 shows that the water vapour transmission rate (WVTR) of the 
silicone/fabric sample is ~0.55 gh! m7. The WVTR of a similar sam- 
ple, but with perforating holes (1mm diameter and 8 mm pitchina 
square lattice, with a hole area fraction @ of ~1.2%) is -3.69 gh m”, 
which is comparable to that of a conventional breathable waterproof 
bandage (5.72 g hm”) and somewhat smaller than that of a standard 
cloth bandage g (9.15 g hm”) (Mannings, Hong Kong). We estimate 
that the complete epidermal VR device, with all of the impermeable 
active components and interconnects, can accommodate perforations 
at an overall areal density of a =1.2%, comparable to that of the test 
structure. As a result, the WVTR for a system with perforations at this 
density should also be in the range of 3-4 g¢ hm’. These epidermal VR 
devices can be worn on the skin (including hairy areas) for extended 
periods, with various levels of physical activity, without irritation (Sup- 
plementary Fig. 12). 


Mechanics simulation of the epidermal VR device 

The commercial software ABAQUS (v6.10) was used to study the 
mechanics of the devices. The layouts of the chips and the shapes of 
the interconnects were optimized to decrease the strain/stress level 
and to avoid entanglements in the interconnects under different types 
of external loads (bending, folding and twisting) (Fig. 1m-—o). The reso- 
nance frequency of the actuator was tuned to 200 Hz (the frequency to 
which humans are most sensitive) by designing the central angle 0 of 
the Pl layer inthe actuator (Fig. le, Extended Data Fig. 2) toincrease the 
vibration intensity. The arrangement of the actuators was optimized to 
decrease the mutual interference between them by adjusting their rela- 
tive angles (Extended Data Fig. 5). The fabric cloth, Silicone A, Silicone 
B and Silicone C, PDMS, phantom skin and magnet were modelled by 
hexahedron elements (C3D8R) while the thin copper and PI film were 
modelled by composite shell elements (S4R). The number of elements 
in the model was -3 x 10’, and the minimal element size was 1/6 of the 
width of the narrowest interconnects (50 um). The mesh convergence 
of the simulation was guaranteed for all cases. The elastic modulus 
(E), Poisson’s ratio (v) and density (p) are as follows: E.4o¢, = 391 kPa, 
Veloth = 0.4, Prtoth = 0.96 x 10° kg m’; Esiticone A =60 kPa, Vsilicone_A = 0.5, p 
Silicone_A = 0.96 x 10° kg m?; Esiticone.B =60 kPa, Vsilicone B = 0.5, Psilicone B =0 
96 x 10° kg m°?; Esiticone.c =3 kPa, Vsilicone_C = 0.5, Psiticone.c = 0.96 x 10° kg 
m°?; Eppms = 1 MPa, Vppus = 0.5, Pppms = 0.96 x 10° kg m?; E.,;, = 130 kPa, 
Vexin = 0.5, Pskin = 1.05 x 10° kg M™?} Emagnet = 113 GPA, Vinagnet = 0-34, 
Pmagnet = 8-08 x 10° kg m™°; Ep, = 2.5 GPa, Vp, = 0.34, Pp) = 0.91 x 10° kgm; 
and E,,, = 119 GPa, vq, = 0.32, Pc, = 8.96 x 10°? kg m®. 


Electromagnetic simulation of the epidermal VR device 

The finite-element method was used in the electromagnetic simula- 
tions to study the magnetic field around the transmission antenna, 
primary coil, NFC coil, Cu coil (in actuators) and intermediate coil. 
The simulations were performed using commercial software Ansys 
HFSS 15 (Ansys Inc.), where the lumped port was used, and the port 
impedance was set according to the matching capacitor (see below). 
An adaptive mesh (tetrahedron elements), together with a spherical 
surface (2,000 mm in radius) as the radiation boundary, were adopted 
to ensure computational accuracy. The electromagnetic parameters in 
the material library of Ansys HFSS were used in the simulation. 


Characterization of the intermediate coil 

The resistance R and inductance L of an intermediate coil (using cop- 
per wire 0.7 mm in diameter) with dimensions 20 cm x 20 cm were 
measured with an impedance analyser. The results agree with simu- 
lations (Extended Data Fig. 7e, f). Addition of a matching capacitor 
with C=168 pF yields aresonant frequency fof ~13.56 MHz (using f=1/ 
[21(LC)°°]). The Q factor of the intermediate coil, defined by Q=2nfL/R, 
is ~200 at a frequency of 13.56 MHz (Extended Data Fig. 7g). 


Electromagnetic simulations for SAR and MPE 

The system utilizes RF at 13.56 MHz, a frequency where biological 
tissues exhibit negligible absorption. The position of the body, there- 
fore, has little effect on the operation, and the system functions well 
with a variety of obstacles in the environment, even including metal 
features, owing to the magnetic nature of the wireless link. FEA (using 
Ansys HFSS 15) was used to determine whether the epidermal VR 
platform operates within the specific absorbed radiation (SAR) and 
maximum permissible exposure (MPE) requirements outlined by the 
Federal Communications Commission (FCC) CFR Part 1.1310. The 
maximum transmitting power (12 W) of the transmission antenna 
(852 mm x 620 mm) was adopted in the simulation. The impedance 
of the circuit part is equivalent to a 170-O (the minimal load of the 
VR system) resistor, measured in the experiments, in series with the 
primary coil. The distance between the body and the coils/antennas 
of the epidermal VR system is 3 mm, which is the distance when the 


epidermal VR system is mounted on the body. The density of the body 
is taken as 1,000 kg m*. 


NFC protocols, software control and system operation 
Programming each SoC (RF430FRLI5XH, Texas Instruments) with the 
Code Composer Studio (CCS) enabled the generation of square wave 
signals by alternating the output of the GP I/O port between its HIGH 
(high voltage output) and LOW (zero voltage output) settings. The 
frequency of the resulting square waves can be adjusted from 100 Hz 
to 300 Hz in a user-definable way through the software interface 
through changes in the SoC program. Each SoC controls eight sepa- 
rate GP I/O ports, and each of these can be controlled independently. 
The program incorporated an interrupt mechanism as the control on 
and off for the square wave. Writing a specific hexadecimal value into 
a particular register on the SoC activated the GP I/O port to gener- 
ate the square wave signal. Writing any other hexadecimal value into 
this same register deactivated the port. As such, the interrupt acts as 
the control mechanism for the square wave. Other hexadecimal com- 
mands initiated data transfer via NFC Data Exchange Format (NDEF) 
messages. The necessary NDEF messages can be written into the SoC 
with a transmission antenna that operates at 13.56 MHz (FEIG, ID ISC. 
LRM2500-A), with an output power between 1 W and 12 W and the abil- 
ity to interface to acomputer/laptop via a USB port. The reader, con- 
nected toa computer/laptop, in this manner served as an interface to 
control the writing process using a custom graphical user interface. 
The software interface displays the connection status of the reader 
with the computer. Each virtual button (pixel) on the touch panel is 
associated with a 4-byte command. Control of each actuator can, in 
this way, be achieved without any interference or cross-talk. To avoid 
time delay, sequential operation was programmed into the SoC, lead- 
ing to millisecond response times. SoCs can be differentiated from one 
another by their unique ID numbers (Supplementary Fig. 5). In this way, 
control of multiple SoCs can be accomplished without uncertainties 
in directionality in RF transmission. 


Visualization of system level operation using an array of 32 
lasers 

The measurements relied on a custom-built system with an array of 
32 lasers. Each laser beam (635 nm, 2 mW continuous wave) reflects 
fromasmall reflective disk (0.15-mm-thick pieces of glass of diameter 
8mm, coated with 10 nm Crand100 nm Au) mounted ona correspond- 
ing haptic actuator in the epidermal VR system. The full collection of 
reflected beams projects onto a monitoring screen to allow direct visu- 
alization of the spatial patterns of activation, as well as the amplitudes 
and orientations of the vibratory motions of the cantilevers associated 
with the actuators. The long light path geometrically converts the 
small amplitude vibrations of the actuators into large motions of the 
reflected spots, for easy visualization. An optical system of mounting 
stages allowed adjustment of each laser beam, including its incident 
angle, tilt angle and distance from the actuators. Each of the 32 laser 
modules was held by an angle post clamp, enabling an independently 
adjustable laser pattern to be configured. Once a laser module was 
adjusted to the proper position, the angle post clamps were fixed. 

In this customized system, six horizontal posts supported the 32 
laser modules: two of the posts supported four modules, the rest each 
supported six modules. The six posts were arranged one above the 
other, and this array was mounted on two vertical posts by angle post 
clamps. The two vertical posts were fixed ona solid aluminium optical 
breadboard (Extended Data Fig. 9a). Careful adjustments yielded a 
pattern of laser beams that were aligned with all the haptic actuators 
across the device. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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Data availability 


All data are contained within the manuscript. Raw data are available 
from the corresponding authors upon reasonable request. 
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Extended Data Fig. 1| Study of the magnetic field strengthin an actuator. a, 
Distribution of the normalized magnetic field strength (H) around the Cu coil the distance between a haptic actuator and a Cucoil inthe actuator. d, The 
(300 turns) used in the actuator, where dis the distance betweentheCucoiland normalized magnetic flux (@/Q,,,,) through the magnet versus turn number 
the magnet. The dashed white circles correspond to the holes in the Cu coil. b, and outer diameter (Doyre,) of the Cu coil. 

Normalized magnetic flux (@/®,_,) through the haptic actuator versus the 


distance to the Cu coil (300 turns) of the actuator. c, Schematic illustration of 
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Extended Data Fig. 2| Resonant frequency tuning of the actuator. a, Top view 
and cross-sectional view of the actuator design. The parameters presented 
here are optimized for tuning the resonance frequency to the skin-sensitive 
range. b-d, Optical images and normalized amplitude-frequency curves of 
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three actuators with different central angles (9) of 150° (b), 186° (c) and 217° (d), 
working without any contact. The actuators shown in band care18mmin 
diameter and 2.5 mm thick. 
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Extended Data Fig. 3 | Optimization of the actuator. a, Cross-sectional 


schematic illustration ofan actuator in contact with skin. b, Theoretical results 
of the resonance frequency of actuators shown ina versus the central angle 6 of 
the PI handling layer. The dashed lines indicate the resonance frequency of 


200 Hzat @=217°.c, Comparison of experimental (E; datasymbols) and 


simulation (theory, T; lines) results. d, Experimental results of the normalized 


amplitude-frequency curves of the actuator (9 =217°) incontact with skin, for 
different values of skin elastic modulus: 60 kPa, 130 kPa and 200 kPa.e, Travel 
amplitude of the magnet asa function of the input power (data points) for an 
actuator in contact with artificial skin samples with elastic moduli of 60 kPa, 
130 kPaand 200 kPa. Inc, e, error bars correspond to the calculated standard 


deviation. 


Article 


a 


ye [i sort Poms 


Artificial Skin 


si PDMS [| Magnet (2 to 20 mm) 
Attificial Bone 


Glass 
(Artificial bone) 


= 200 kPa 
PDMS 3 200. 
(Artificial skin) o 130 kPa 
a 60 kPa 


0 5 10 15 20 
Thickness (mm) 


. f 
.  @e@e4e0 e©oee , 
2 500um neeN ro" 
> at x <i - a 
a eee @ 16) e eeeeee 
ke oN en 
Cieee:si'e@ © 8 6 ee 0 
Pee ae 
O's @.08'@8'0@ © 6€08268 0 
! ‘ | _~ H 
a — eeeeee? 
3 14oum | PRE OOP 
> i 
® eeee eecee 
8 
Extended Data Fig. 4 | Study of the mechanical behaviour of the actuator correspond tothe calculated standard deviation. d, Optical images captured 
when in contact with skin. a, b, Cross-sectional schematic illustration (a) and using a high-speed camera anda working actuator travelling up and down, 
optical image (b) of an actuator in contact with artificial skin. Here PDMS, when in contact with 130-kPa artificial skin. e, FEA results for the amplitude of 
serving as artificial skin, had three different values of elastic modulus, and an actuator: left, when separated from skin; right, when in contact with skin. 
glass was used for artificial bone. c, Comparison of the experimental (E) and f, Schematic illustration and FEA results (colour-coded amplitude) for 
simulation (theory, T) results of the resonance frequency of the actuator in mechanical coupling between an array of haptic actuators, with activation inan 


contact with different moduli and thicknesses of artificial skin. The error bars ‘N’ pattern. 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5 | Mutual interference study ofan actuator array. a, 
Mutual interference of two actuators at different relative angles a at 200 Hz. 
Two representative cases were studied, with one actuator (no. 1) positioned 
along (left) or perpendicular to (right) the bisector of the other actuator (no. 2). 
Here, only actuator 1 was actuated. The amplitude ratio, thatis, the amplitude 
of actuator 2 over the amplitude (induced by mutual interference) of actuator 1, 
shown in the table demonstrates that a=45°, 90° and 270° result in relatively 
small mutual interference for both representative cases simultaneously. 

b, Mutual interference of two small actuators at different relative angles a at 
their resonant frequency of 200 Hz. The size of the actuators and the distance 
between them were scaled to 1/10 of the original design as shown ina, thatis, 
the distance between two actuators was 2.1mm rather than 21mm. Here the 
thickness of the PI disk was set at 1.8 pm to enable the resonant frequency for 


the small actuator to be 200 Hz. Two representative cases were studied, with 
one actuator (no. 1) positioned along (left) or perpendicular to (right) the 
bisector of the other actuator (no. 2). Here only actuator 1 was actuated. The 
amplitude ratios (actuator 2 to actuator 1) due to mutual interference are 
shownin the table. c, Optimization of the actuators’ arrangement. The mutual 
interference among actuators was studied for a=0°, a=45° and acombination 
of 90°/270° referring to the simulation results ina, and for representative cases 
when actuators A (around the centre), B (near the boundary) and C (near the 
corner) are actuated separately. The results show that a=90°/270° yields the 
smallest mutual interference (see tables under). The number gives the 
amplitude ratio due to the mutual interference among the actuators—that is, 
the amplitude of all actuators over the amplitude of the activated actuator— 
where Lrepresents the activated actuator. 
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Extended Data Fig. 6 | Study of the key electrical components of the 
epidermal VRsystem.a, Interference between the primary coil and NFC 
antennas. Shownare schematic illustrations of a primary coil with (left) or 
without (right) four NFC antennas along the Z direction of the transmission 
antenna. b, Comparison of experimental (E) and theoretical (T) results of 
voltage induced by asingle primary coil versus a primary coil with four NFC 
antennas. c,d, Transmission antennas used for operating the VR devices: c, 
small size, 318 mm x 338 mm; and d, large size, 620 mm x 852 mm.e, An 
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6. 20. 40+ 60 80 
Distance Z (cm) 
transmission antenna placed inthe X-Y plane. f, The magnetic field strength (H) 
inthe Z-X plane (the middle plane of the coil) for the small (left) and large (right) 
transmission antennas (RF readers). g, Theoretical results show that the small 
and large transmission antennas are suitable for short (<24 cm) and long 
(>24 cm) working distances, respectively. h, i, Output power of the serpentine 
primary coil of the epidermal VR device as a function of distance to the small 
transmission antenna (h), and the large transmission antenna (i). The error bars 
correspond to the calculated standard deviation. 
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Extended Data Fig. 7 | Electrical properties of the intermediate coil. 

a, Configuration of an intermediate coil (20 cm x 20 cm, wound with Cu wire 
witha diameter of 0.2 mm) oriented parallel tothe X-Y plane, at a distance Z 
from the transmission antenna. b, Computational results for the magnetic field 
distribution induced by an RF transmission antenna tuned to resonance witha 
receiver antenna with and without an intermediate coil. c, Comparison of the 
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magnetic field strength along the Zdirection of the transmission antenna with 
(W) and without (W/O) the intermediate coil. d, Amplification factor n along 
the Zdirection of the transmission antenna with and without the intermediate 
coil. e-g, Simulation and experimental results for the inductance (e), resistance 
(f) and Q factor (g) asa function of frequency. 
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Extended Data Fig. 8 |Mechanical characterization of the epidermal VR 


system. a—d, Output power from the primary coil of an epidermal VR device as 


afunction of tilt angle y (a, geometry; b, data), bending radius, R (c) and 
bending cycles toanR of 2.8 cm (d). The distance between the device and the 


antenna was fixed at 20 cm for all measurements. e, Measured phase responses 
of the antennas used for wireless control over the SoCs as a function of radius of 
curvature. The resonance frequency is 13.56 MHz before bending. Bending 
induces only very slight shifts in these curves. 
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Extended Data Fig. 9 | Visualization of system level operation using an array 
of 32 lasers. a, Schematic illustration ofa custom-built laser array system for 
real-time visualization of the operation of acomplete epidermal VR system. 
b-d, The array of lasers (b), the corresponding array of beams reflecting from 
the haptic actuators, each mounted with a reflective disk (diameter of 8 mm), 
across an entire system (c), and their arrival at a monitoring screen (d). 

e, Representative frames extracted from video recorded using a high-speed 
camera to capture oscillatory motions of each of the laser spots. These motions 
directly determine the motions of the cantilever-based actuators. f, Schematic 
illustration ofa laser spot produced by projection of areflected beam ontoa 
screen during the operation of the actuator. g, Representative frames 
extracted from video recorded using a high-speed camera showing the 
oscillatory motion of laser spot 1.h, Pictures of alaser spot ona reflector that 
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mounts ona haptic actuator (left) and onthe monitoring screen (right). The 
diameter of the laser spot is -3 mm. i, Calculated displacements of four 
actuators determined from the measurement setup geometry andthe 
amplitude of motion of the laser spotsing,e and Supplementary Video 4. The 
traces are offset in the y direction to facilitate visual inspection. The calculated 
displacements are somewhat smaller than those measured directly from 
individual actuators using high-speed cameras owing to slight misalignments 
of the lasers and to shifts in the resonance frequencies due to absence of the 
PDMS encapsulation layer for the devices measured using the laser technique. 
The results allow direct visualization and measurement of the vibration 
amplitudes, direction and frequency of the cantilever beams associated with 
each actuator across the full array. 
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Extended Data Fig. 10 | Pictures of the operation of an epidermal VRsystem, 
visualized with a reflected array of 32 laser beams. Activation of a given 
haptic actuator causes the corresponding reflected spot totransform froma 
circular to an elliptical shape, owing to the vibratory motions (top row). The 
results in the lower three rows show representative spatial patterns of 


actuation, including numbers Oto 9 and letters ‘N’, ‘A’, ‘T’, ‘U’, ‘R’ and ‘E’. We note 
that the detailed shapes of the laser spots on the screen depend critically onthe 
positioning of each of the beams across the corresponding reflectors mounted 
onthe cantilevered actuator structures. 
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The distribution of charge density in materials dictates their chemical bonding, 


electronic transport, and optical and mechanical properties. Indirectly measuring the 
charge density of bulk materials is possible through X-ray or electron diffraction 
techniques by fitting their structure factors’ °, but only if the sample is perfectly 
homogeneous within the area illuminated by the beam. Meanwhile, scanning 
tunnelling microscopy and atomic force microscopy enable us to see chemical bonds, 
but only on the surface*®. It remains a challenge to resolve charge density in 
nanostructures and functional materials with imperfect crystalline structures—such 
as those with defects, interfaces or boundaries at which new physics emerges. Here we 
describe the development of a real-space imaging technique that can directly map the 
local charge density of crystalline materials with sub-angstr6m resolution, using 
scanning transmission electron microscopy alongside an angle-resolved pixellated 
fast-electron detector. Using this technique, we image the interfacial charge 
distribution and ferroelectric polarization in a SrTiO,/BiFeO, heterojunction in four 
dimensions, and discover charge accumulation at the interface that is induced by the 
penetration of the polarization field of BiFeO;. We validate this finding through side- 
by-side comparison with density functional theory calculations. Our charge-density 
imaging method advances electron microscopy from detecting atoms to imaging 
electron distributions, providing a new way of studying local bonding in crystalline 


solids. 


The ability to directly visualize the distribution of electrons in solids 
and molecules could greatly advance science, as nearly all physical 
properties of materials are determined by the rearrangement of elec- 
tron charges between nuclei when atoms aggregate together. How- 
ever, observing electrons in materials at high spatial resolution is not 
routine. Unlike other diffraction methods’ *, aberration-corrected 
scanning transmission electron microscopy (AC-STEM) offers the pos- 
sibility of achieving atomic-resolution imaging, by using an electron 
beam focused to a sub-angstr6m width. While penetrating througha 
specimen, the electron beam interacts with the local electric field in its 
pathway, resulting ina change inits momentum. Recently, differential 
phase contrast (DPC) imaging in STEM was developed to estimate the 
momentum of the electron probe by counting the integrated scattering 
electrons in segments of an annular STEM detector”®. If the pattern of 
the entire electron beam can be captured using a fast camerain STEM, 
the electric field at each scanned position can be calculated from the 
momentum change with high precision. Using this method, electric 


fields and charge densities have been mapped in simulations’, but 
experimental results have been stymied by strong noise and insuf- 
ficient resolution”. Here, using the state-of-the-art AC-STEM anda 
high-speed pixellated electron detector, we have successfully imaged 
the local electric field and charge density in SrTiO,, BiFeO, and a SrTiO,/ 
BiFeO, heterojunction with ultrahigh spatial resolution in real space. 
The high fidelity of this approach—supported by comparisons with 
density functional theory (DFT) calculations—shows its potential for 
studying the complex interplay between charge, field and atomic struc- 
ture in heterogeneous materials. 

We first carried out real-space charge-density imaging (RSCDI) of 
SrTiO; by performing four-dimensional (4D) scanning electron diffrac- 
tion using an AC-STEM equipped with a high-speed pixellated electron 
detector (Gatan OneView; Fig. 1a, b and Methods). SrTiO, is a simple 
cubic perovskite with a projected square symmetry (Fig. 1c). In Fig. 1d, 
the high-angle annular dark field (HAADF)-STEM image of SrTiO, clearly 
shows the Sr and Ti atomic columns, whereas O atoms cannot be seen 
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Fig. 1| Experimental setup and the electric field in SrTiO,.a, The 
experimental setup for 4D scanning electron diffraction using an AC-STEM. As 
an electron probe scans across the two-dimensional (2D) surface of asample, a 
2D diffraction pattern at each point is acquired using a pixellated detector, and 
aconvergent beam electron diffraction (CBED) pattern is formed, enabling 
further analysis. b, The as-acquired scanning diffraction dataset froma unit 
cellin SrTiO. a.u., arbitrary units. c, Atomic structure obtained from DFT 
calculations. d, HAADF-STEM image of SrTiO. Scale bar, 2 A.e, Corresponding 
electric-field map derived using the shift in the centre of mass fromthe 
scanning diffraction dataset. The vector arrows represent the direction and 
magnitude of the local electric field (in volts) in a unit cell of SrTiO. 


owing to their weak scattering of electrons. Inthe 4D dataset (Fig. 1b), 
regions surrounding the Sr and TiO columns are brighter, as electrons 
are scattered more strongly by heavy nuclei. Capturing the entire dif- 
fraction pattern allows us to calculate the shift in the centre of the dif- 
fraction intensity (also referred to as the centre of mass, COM) at each 
point of the electron probe, which covers an area of 0.6 A in diameter. 
The lateral shift, ACOM, relates to the change inthe momentum of the 
electron beam, Ap,,, and is negatively proportional to the local electric 
field, E,,, when the sample is thin (Extended Data Fig. 1)”"°. Explicitly, 
the electric field can be calculated using: 


— “AP, v, __ ACOMP, v, 
io e Az e dz 


where Azis the sample thickness, v,is the speed of the electrons along 
the beam direction, eis the electron charge and p, is the linear momen- 
tum of electrons along the beam direction. 

Figure le shows the electric-field map derived from the 4D dataset 
for aSrTiO; sample witha thickness of 5.6 + 1nm (the sample thickness 
was measured by least-squares fitting of the experimental position- 
averaged convergent beam electron diffraction (PACBED)# data with 
the simulated datasets; see Extended Data Figs. 2, 3 for details). The 
vectors show the direction and magnitude of the local electric field. 
Note that the electric field is radially distributed around each Sr, Ti 
and O atom, suggesting that it is highly symmetrical around the ions 
in SrTiO,. Furthermore, it is obvious that the electric field around the 
O columns is much weaker than that around the Sr and TiO columns. 

In contrast to SrTiO,, BiFeO, adopts arhombohedral phase at room 
temperature” (Fig. 2a). DFT calculations show that electric polariza- 
tion in BiFeO, originates from the deformation of the Bi 6s lone-pair 
electrons, which drives the rotation of FeO octahedra along the (111) 
axis and displaces the Fe atoms away from the centre of the surround- 
ing oxygen’s octahedral cage’. Figure 2b shows an HAADF image 
along the BiFeO, (100) orientation, with the Biand FeO columns easily 
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Fig. 2| Atomicstructure and electric-field dipole of BiFeO,.a, Atomic 
structure of BiFeO;, obtained from DFT calculations. Bi atoms are in red, Fein 
blue and Oin grey. b, Atomic-resolution HAADF-STEM image of BiFeO,; scale 
bar, 4A. The Biatomic columnsare seen shifted towards the top right of each 
unit cell (defined by the four Fe columns in the corners).c, The dipole moment 
is polarized towards the top right, as shown by the vectors. d, Conventional 
STEM bright-field image (left) and electric-field map (right) of BiFeO3, 
generated fromscanning diffraction data. Scale bar, 4 A.e, Magnified 
reconstructed dark-field STEM image of a BiFeO, unit cell from the as-acquired 
scanning diffraction dataset. Biis marked in purple, Fein red. The direction of 
the electric field (E) is shown schematically. Scale bar, 1 A.f, Corresponding 
electric-field map, where the colour represents the magnitude of the local 
electric field. The electric-field vector around Biis stronger in the direction of 
the bottom left corner. Around Fe columns, the electric field is also deformed 
diagonally towards the bottom left. 


identified from the zcontrast, as indicated in the inset atomic model. 
In this region, Bi atoms are displaced by about 0.35 A away from the 
geometric centre of the four nearest FeO columns towards the top 
right. In the polarization map (Fig. 2c), the arrows represent the dipole 
moment, which is related to the shift of Biatoms*”. 

Figure 2d shows a bright-field STEM image (left) and an electric-field 
map (right), both derived from the scanning diffraction datainthe same 
area as for Fig. 2b. A polarized electric field can be seen around all of 
the atomic columns, and a net dipole moment that points diagonally 
downwards is clearly revealed. A magnified image from the recon- 
structed HAADF is shown in Fig. 2e. The electric field inthe same region 
is shown in Fig. 2f, where the dipole-induced polarized electric field is 
shown at a sampling rate of 0.2 A per pixel. It is worth noting that the 
electric field surrounding the cation columns is no longer radially sym- 
metrical: the field is weak at the top right and stronger at the bottom 
left across the Bi site. 

Given the electric-field landscape, we can construct a 2D map of local 
charge density, as these two quantities are related through Gauss’s law”*: 

p 


V-E==4 
£0 


where pis the charge density and €)is the vacuum permittivity. Although 
this equation is specified for three-dimensional (3D) geometries, we 
may drop the z dependence by integrating both sides along the zaxis 
(the direction of the electron beam). Figure 3a, b shows 2D charge- 
density images of SrTiO, and BiFeO, projected along the (001) plane, 
derived from scanning diffraction experiments. The charge-density 
maps contain negative contributions from both core and valence elec- 
trons and positive contributions from atomic nuclei. Note that the 
nuclear charge distributions appear as broad Gaussians because of the 
shielding effect from core electrons and the size of the electron probe. 
For direct comparison of theory and experiment, we used Gaussian 
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Fig. 3 | Real-space charge-density mapping in SrTiO, and BiFeO,. a, Charge- 
density map, p, of bulk SrTiO; from scanning diffraction experiments. 

b, Charge-density map of bulk BiFeO,. c, Charge-density image of bulk SrTiO, 
obtained from DFT calculations. d, Charge-density image of bulk BiFeO; 
obtained from DFT calculations. The simulated positive-charge distributions 
for the nuclear charge and the core electrons are constructed using a three- 
dimensional Gaussian function. 


distributions to construct the positive-charge distributions of the 
nuclear charge and the core electrons, added them to the negative- 
charge distributions of the valence electrons obtained from DFT cal- 
culations, and then projected the total charge-density distribution 
from three to two dimensions (see Fig. 3c, d for SrTiO, and BiFeO,, 
respectively; see Methods for the construction of the positive-charge 
density). The similarities in the key features shown in Fig. 3 (Fig. 3a, b 
from experiment and Fig. 3c, d from theory) suggest that RSCDl indeed 
reliably reveals the details of the charge distribution between atoms 
in crystalline solids. 

Oneimmediate outcome of this charge-density mapping is the clear 
appearance of the O atomic columns in both materials. Locating O 
atoms is important for studying the octahedron rotation that is often 
involved in the phase transitions and changes in the physical proper- 
ties of complex oxides suchas La,_,Sr,MnO, (ref. ”) and Sr,RuO, (ref. '’). 
For SrTiO, (Fig. 3a, c), the O columns are thin, indicating no rotation 
of O octahedra away from the principal axis. In sharp contrast, the O 
columns for BiFeO, are elliptical (Fig. 3b, d) owing to alarge rotation of 
Ooctahedra (by about 11°, according to DFT calculations), resulting in 
the splitting of the O columns in the 2D projection (Fig. 2a). 

In the interstitial regions, the negative-charge density of SrTiO, 
(Fig. 3a) displays a four-fold symmetry around the columns of Sr atoms, 
matching the crystallographic symmetry of this material. Although the 
ionic nature of bonding in SrTiO, dominates the interatomic interac- 
tion, the existence of intense red regions suggests an accumulation 
of electron charge between O atoms and cations, indicating the small 
covalent characteristics of Sr—O and Ti-O bonds in SrTiO;. In contrast, 
the 2D charge-density image of BiFeO, shows the deformed shapes of 
positive charge on all atomic sites. As shown in Fig. 3b, d, the charge 
contours on Fe sites exhibit triangular geometry, which originates from 
the Oatoms shifting away from the principal axia and anisotropic FeO 
bonds with d orbitals of Fe atoms. The positive charge-density pocket 
of Bishowsa partial connection to the closest O column, and the exist- 
ence of more intense red pockets between cations and O indicates 
the stronger covalent nature of BiO and FeO bonds in BiFeO, than of 
the bonds in SrTiO. This is also supported by analysis of the crystal 
orbital Hamiltonian populations in Bi-based ferroelectric materials 
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suchas BiMnO, (ref. *). We note that regions of intense negative-charge 
density migrate to the bottom left of the nuclei, from which we can 
directly image the electric polarization in BiFeO,, along with the O 
octahedron tilt. 

Asa further step, we calculated the positions of weighted centres of 
positive and negative charge in BiFeO,, which allowed us to determine 
the effective charge separation in a unit cell—a key physical quantity 
of multiferroic materials. We found that the separation between the 
positive and negative charge centres is about 0.57 Aalong the diagonal 
axis projected in the (100) plane (Extended Data Fig. 4). According to 
the definition of the dipole moment, p = qd (where q is the charge and 
dis the displacement), the measured p is (3.12 + 0.9)e A per unit cell, 
corresponding toa ferroelectric polarization of 78 + 23 1.C cm along 
the (111) direction—comparable to the reported value of 100 uC cm” 
(ref.’°). Therefore, high-resolution RSCDI enables us not only to directly 
see the charge polarization in BiFeO,, but also to quantitatively deter- 
mine its dipole moment. 

The ability of RSCDI to quantify physical characteristics at high spa- 
tial resolution makes it possible to study the complex interfaces found 
in oxides”?”, where many intriguing phenomena have been reported, 
suchas the existence of superconductivity” ™, polarization vertices”, 
the quantum Hall effect? and magnetism”””°, Applications based 
on these phenomena require a better understanding of the interplay 
among lattices, electrons, orbitals and spin at the interface”. BiFeO, is 
multiferroic and the effect of its strong electric polarization is predicted 
to cause band bending in adjacent SrTiOs, which, in turn, exhibits sev- 
eral emergent properties such as controllable interfacial conduction 
and photovoltaics””. However, owing to the absence of detailed experi- 
mental information regarding charge rearrangement in the interfacial 
region, understanding of the interplay among the atomic structure, 
charge, orbital and spin relies almost solely on DFT calculations, which 
are also limited by the size and complexity of the system. 

Here we used RSCDI to image the electric field and charge distribu- 
tion at the interface in a thin film of BiFeO, grown on (001) SrTiO;. As 
suggested by the structural model obtained from DFT calculations 
(Fig. 4a), the SrTiO; substrate connects to BiFeO; by sharing a BiO layer 
at the interface. In Fig. 4b, the HAADF-STEM image shows a sharp con- 
trast between weak Sr intensity and strong Biintensity. The atomically 
sharp interface is also seen in atomic-resolution X-ray energy dispersive 
spectrum (EDS) mapping (Extended Data Fig. 5). As shownin Fig. 4c, d, 
we mapped the electric field and the projected charge density in the 
same area as in Fig. 4b. The electric dipoles are visible in BiFeO,, par- 
ticularly near the Bicolumns. On approaching the interface, the electric 
dipoles become weaker in the bottom BiFeO, layers, and some induced 
dipoles appear around the Sr columns of the top SrTiO, layers in the 
substrate; a similar phenomenon involving the electric field penetrating 
the insulator has been proposed at PbZr,,Tiy.,0;/SrTiO; interfaces”. 

Inthe charge-density image, the interface effect between the insulat- 
ing SrTiO, and the ferroelectric BiFeO, is revealed in three folds at the 
atomic scale: the O octahedron rotation, the electric polarization, and 
the valence charge state of Ti. As shown in Fig. 4d, the oxygen columns 
are well resolved in SrTiO, but become vague and elongated in BiFeO, 
owing to the rotation of FeO octahedra. However, the O atoms at the 
interface appear to have less positivity than in SrTiO, and less elonga- 
tion than in BiFeO;, indicating that the O octahedra are in anintermedi- 
ate state. In BiFeO,, the separation between the positive and negative 
charge pockets is clear but weakens gradually when approaching the 
interface. Surprisingly, the charge separation persists in SrTiO;, witha 
smaller amplitude than in BiFeO,. In Fig. 4e, f, we correlatively plot the 
atomic displacement, O octahedron rotation and charge separationin 
each unit cell (see Methods and Extended Data Fig. 6). Both atomic dis- 
placement and octahedral rotation change rapidly across the interface, 
with the Bi displacement of roughly 0.35 A in BiFeO, changing to the Sr 
displacement of roughly 0.13 Ain SrTiO3, and the octahedral rotation 
falling sharply from 10° to less than 1°. The charge separation is more 
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adaptive, dropping early in the second layer of BiFeO; from 0.54 A to 
0.33 Aand remaining the sameacross the interface until after the first 
layer of SrTiO;, where it drops further to 0.23 A. The atomic structure 
combined with the electron charge relocation shows that the electrons 
are most responsive to the propagation of the electric field of the polar- 
ized BiFeO,, while the atomic displacement is more rigidly affected 
by the atomic strain. This unsynchronized response takes places in 


Fig. 4 | Charge-density map, O octahedron rotation and valence charge state 
at the interface between SrTiO, and BiFeO,.a, Atomic structure of SrTiO,/ 
BiFeO, obtained from DFT calculations. b, Atomic-resolution HAADF-STEM 
image of SrTiO,/BiFeO;. Scale bar, 4 A. Arrows show the direction and relative 
magnitude of Bi displacement. c, Corresponding electric-field map derived 
from the scanning diffraction dataset. d, Charge-density map for SrTiO,/ 
BiFeO;.e, Changes in A-site displacement across the interface (that is, changes 
in the displacement of the Bi or Sr atom from the geometric centre of the four 
nearest Fe or Tiatoms), determined experimentally and by DFT calculation 

(in Angstr6ms; error bars denote standard deviation; see Methods). Alsoshown 
is the O octahedron rotation determined experimentally (in degrees; error bars 
denote the detection limit) and from DFT calculations (scattered points). 

f, Charge separation between weighted centres of positive and negative charge 
within unit cells across the interface. Error bars denote the detection limit. 

g, Total charge of Ti+ Oand Fe + O onthe two sides of the interface, measured 
using RSCDI. Error bars denote standard deviation. h, Valence states of Tiand 
Fe measured using high-energy-resolution EELS. Error bars denote standard 
deviation. 


the interfacial region of three unit cells (between the two blue dashed 
lines in Fig. 4e, f) and leads to the phenomenon of interface charging, 
which is the key to understanding and engineering the 2D electron or 
hole gas localized at the interface. 

To confirm the existence of an electron-rich interface, we calculated 
the total charge in regions covering the Ti+ O and Fe + O columns by 
integrating the measured charge density (see Methods and Extended 
Data Figs. 7-9). Figure 4g shows the total charge of the Ti+ O and Fe+O 
columns layer by layer; the charge of both columns drops at the inter- 
face. Figure 4h and Extended Data Fig. 10 show the results of spatially 
resolved electron-energy-loss spectroscopy (EELS) on Ti”® and Fe””*: 
the Ti valence state decreases from 4" to 3.7’, indicating a mixed state 
of 4*and3*. The lowered Ti valence state seen in RSCDI and EELS reveals 
the accumulation of electrons, a phenomenon bearing similarity to the 
electron liquid that has been proposed at other oxide interfaces”” and 
observed using in-line electron holography”. Thus, using RSCDI, we 
have demonstrated directly that the SrTiO;/BiFeO; interface is electron 
rich, emerging from differences in how the atomic structure and the 
surrounding electron density evolve. 

In summary, we have developed a new way of mapping the local 
charge density of materials in real space with sub-angstrom resolu- 
tion STEM andestablished the validity of this technique by side-by-side 
comparison with DFT calculations for SrTiO, and BiFeO;. We have also 
revealed the atomic-scale charge density at the interface of SrTiO, 
and BiFeO,, as well as variation in electric dipoles and valence states 
in the interfacial region. The ability to experimentally trace electron 
redistribution and to probe local bonding in heterogeneous materials 
at the subatomic level should have a substantial impact onthe charac- 
terization and design of functional materials. 
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Methods 


Materials and sample preparation 

BiFeO, films are grown on single-crystal (100) SrTiO, surfaces by reac- 
tive molecular beam epitaxy (MBE). The 5O-nm BiFeO, layer is depos- 
ited at a substrate temperature of 625 °C in distilled ozone (roughly 
80% ozone) at a partial pressure of 1x 10° torr. Transmission electron 
microscopy (TEM) samples of SrTiO, and BiFeO, thin films on SrTiO, 
are prepared by hand polishing followed by ion milling, to providea 
large, uniformly thin area transparent to an electron beam. 


Imaging methods for scanning diffraction and instrumentation 
Scanning diffraction datasets are collected by synchronizing the 
scanning beam in STEM mode with one of the primary cameras ona 
microscope that is generally used to capture TEM images or diffrac- 
tion patterns”. During STEM imaging, the electron beam is converged 
into an electron probe witha width of less than 0.6 A (with aberration 
correction). Inthe detector plane, the electron beam forms a conver- 
gent beam electron diffraction (CBED) pattern. Conventional STEM 
images are built up by collecting the scattered or transmitted elec- 
trons during the raster of the electron probe over the sample; using 
high-angle annular dark field (HAADF), annular bright field (ABF) and 
bright-field detectors, electrons scattered into different solid angles 
yield a single value for each position of the scanning probe. In scan- 
ning diffraction, we collect the entire CBED pattern as a 2D image for 
each position; in combination with the 2D scanning area, this results 
ina 4D dataset. 

In experiments, the 4D data are collected onaJEOLJEM 300CF dou- 
ble-aberration-corrected STEM at 300 kV; CBED patterns are recorded 
with a Gatan OneView camera at a speed of 300 frames per second 
(fps), each frame having a size of 512 x 512 pixels. A semi-convergence 
angle of 32 mrad is used for probe forming in imaging electric field. 
Beam scanning is synchronized with the camera using the Gatan STEMx 
system, with a scanning step size of 0.2 A. 

In previous attempts to derive charge-density information through 
scanning diffraction, noise stemmed from two main factors. First, the 
slow acquisition rate of the pixellated camera (30-40 fps) exacerbated 
the effects of sample drift, electron-beam damage and scanning noise, 
which all lead to image noise that obscures the charge information. 
Atthe same time, owing to the slow scanning rate, finer scanning with 
more sampling points cannot be achieved: finer scanning is required 
to show the detailed electric field and 2D charge image. Second, the 
insufficient signal-to-noise ratio and number of pixels for capturing 
the diffraction pattern during scanning will lower the signal-to-noise 
ratio and accuracy in measuring the electric field and charge. 

In our experiment, we have improved on these factors. First, we use 
a Gatan OneView camera and K2 camera, which have a base acquisition 
rate of 300 fps; this offers both faster scanning and more sampling, 
minimizing sample drift during scanning and improving spatial reso- 
lution. Second, during imaging we use a weak beam intensity witha 
larger camera length, projecting the CBED patterns onto the 512 x 512 
pixel camera. In combination with the high dynamic range of the com- 
plementary metal oxide semiconductor (CMOS)-based cameras, this 
enhances the details of the diffraction patterns. 


Image construction 

HAADF and bright-field images are reconstructed from 4D data by 
integrating the intensity of the annular regions of 65-172 mrad and 
0-32 mrad, respectively, fromthe CBED patterns. These virtual detec- 
tors are analogous to the physical HAADF and bright-field detectors 
in STEM imaging. 


Calculation of electric field 


Owing to the negative charge of electrons, when an electron probe 
transmits through an electric field, the electrons will be deflected. In 


the simplified model, ina uniform electric field, the probe will shift in 
the diffraction plane in a negatively proportional way to the electric 
field, owing to the momentum change induced by the electric field. In 
our scanning diffraction experiment, the use of the pixellated electron 
detector allows the acquisition of each diffraction pattern in its entirety, 
as well as the calculation of the momentum change on the basis of the 
redistribution of diffraction intensity. 

Calculation of the electric field is based on the change in momentum 
of the electron beam. To analyse the scanning diffraction dataset, 
we apply a circular mask to eliminate intensity from the high angles 
(more than 64 mrad), because it has been shown that scattering at high 
angles is less sensitive than low-angle scattering to the momentum 
transfer of the primary beam. In addition, by eliminating much of 
the detector’s area where the measured intensity is low, we eliminate 
a large source of noise from the centre-of-mass calculation. We have 
found that this mask substantially decreases the noise level in our 
final electric-field images. 

Next, we calculate the COM of each diffraction pattern in the dataset; 
the centre of the diffraction pattern is determined by averaging the 
COM within 64 mrad ofall diffraction patterns. Then, at each scanning 
point, the deviation of the COM fromthe centre of the diffraction pat- 
tern gives usa vector field. As shown in ref. ?, the COM of the diffraction 
pattern is equivalent to the momentum transfer. Extended Data Fig. 1 
shows the strength of the electric-field change at different locations 
within SrTiO, samples of varying thicknesses, with the scanning dif- 
fraction data being generated using multi-slice simulation**. For sam- 
ples thinner than 6nm, the momentumtransfer is proportional tothe 
increase in sample thickness. Fitting the electric-field strength versus 
thickness plot with a linear curve shows that the precision measured 
by standard deviation is 3.5% for weak field regions between atoms 
(blue), 1.3% for areas closer to atoms (red), and 6.7% for areas very 
close to atoms (black). The change in the linear relationship is due 
to the beam being continuously shifted when propagating through 
the sample. For thicker samples, the accuracy gradually falls off. In 
samples with thicknesses of around 6 nm, the centre of weight can be 
stillinterpreted as being representative of the local electric field, with 
some loss in quantitative accuracy. 

To measure the electric field in detail, the procedure is as follows. 
First, calibrate the pixel size in momentum space (in units of mrad per 
pixel). Second, the momentum change of electrons is based on: 


OD 
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Solving this differential equation with the assumptions outlined in 
ref. ’, the momentum change can be written in terms of a few simple 
parameters: 
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Where Ap,, is the momentum change, eis the charge of an electron, 
Azisthesample thickness, v,is the speed of the electron along the beam 
direction andE,, is the electric field. Given that the diffraction pattern 
is amomentum space image of the probe, Ap,, can be calculated from 
the shift in the COM of the diffraction pattern: 


AP,, = ACOMp, 


where ACOM iis the shift inthe COM from the geometric centre (in mil- 
liradians) and p, isthe momentum of the electron beam along the beam 
direction. Third, the electric-field strength can be calculated using: 
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Measurement of specimen thickness 

Position-averaged CBED (PACBED) is carried out by acquiring the aver- 
age CBED pattern in the diffraction plane while the electron beam scans 
across a small area. A smaller convergence angle of approximately 
10.6 mrad is used. Previous work” has shown that the thickness and 
orientation of the sample have a large influence on the PACBED pat- 
tern. Here, by recording CBED patterns with a low convergence angle 
and incoherently averaging over larger regions (roughly 1nm7), we 
compare the experimental results with simulated PACBED patterns to 
quantitatively determine the thickness of our samples. The database of 
simulated PACBED patterns is generated using multi-slice simulation** 
containing PACBED patterns from different sample thicknesses at the 
same imaging condition (300 kV, 10.6 mrad). The sample thickness is 
then estimated at the precision of around 1nm. The tilt of the sample 
is also more apparent under these conditions than in standard imag- 
ing conditions, so we use PACBED to align the sample to the zone axis 
more accurately during data acquisition. 

The PACBED in Extended Data Fig. 2 is taken experimentally fromthe 
boxed area of SrTiO, labelled in the HAADF-STEM image in Extended 
Data Fig. 2a. This PACBED is compared with simulated PACBEDs from 
samples of thickness 0.8-16 nm. Quantitatively, least-squares fitting 
is carried out on PACBEDs from an SrTiO, sample and from the SrTiO, 
part of the interface, showing that the SrTiO, region is around 5.6+1nm 
thick. Extended Data Fig. 3 shows a PACBED taken from the selected 
BiFeO; region; this PACBED is comparable to simulated PACBEDs from 
BiFeO, samples of thickness 6-8 nm. 


Mapping charge density 

After deriving the electric-field landscape, we can calculate the diver- 
gence of our measured electric field and then determine the charge 
density by using Gauss’s law: the divergence of the electric field is 
proportional to the charge density. After mapping the charge density 
in a unit cell, we can separate the positive and negative charges and 
calculate their weighted centres, as in Extended Data Fig. 4 for BiFeO,. 
In BiFeO,, the measured positive and negative charge centres separate 
along the diagonal direction with a spacing of 0.57 A. 


Image correction 

Eventhough the acquisition speeds of fast CMOS cameras have improved 
substantially, reaching only afew milliseconds per frame, the dwell pixel 
timeis still more than a hundred times slower than that of ADF and bright- 
field detectors at the microsecond level. Sample drift and image distor- 
tion therefore exist that can be seen from the reconstructed HAADF 
and bright-field images. This geometric image distortion needs to be 
corrected. We used 2D Gaussian fitting on the reconstructed HAADF 
image to determine the atom column position in the scanning diffrac- 
tion experiment. Using a conventional HAADF image asa reference, the 
deviation of the atomic locations in scanning diffraction is compensated 
using geometric transformation, where each pixel location is corrected 
onthe basis of the local correction matrix, considering lattice rotation 
and scaling. The corrected data with reduced distortion can then reveal 
more details on the electric-field distribution. Unit cells with similar pat- 
terns can be averaged to improve the signal-to-noise ratio. 

In detail, the sample drift along the horizontal direction is corrected 
by measuring the average displacement per line from a conventional 
fast-scanned STEM image and then shifting each line of pixels back. Drift 
along the vertical scan direction is corrected by rescaling the vertical 
axis to fit with the fast-scanned STEM image. These operations are first 
performed onthe reconstructed HAADF and bright-field images on the 
basis of the visible lattice, and the same shift and rescaling operations 
are then applied to the electric-field maps. 


DFT calculation 
First-principles DFT calculations are performed with the Vienna ab- 
initio simulation package (VASP)*>”°. The spin-polarized generalized 


gradient approximation (GGA)” is used to describe the exchange-cor- 
relation interaction among electrons. We treated Bi 6s 6p, Fe 3d 4s, Sr 
4s 4p 5s, Ti3d4s and O 2s 2p as valence states and adopted projector- 
augmented wave (PAW) pseudopotentials to represent the effects of 
theirionic cores**”’. Spin—-orbit coupling is not included in the calcula- 
tion. To describe the correlation effect properly, we use the GGA + U 
method for the localized d orbitals of Fe (Coulomb repulsion U=3.0 eV; 
on-site exchange / = 0.0 eV)*°. Calculations of 2 x 2 x 2 supercells are 
carried out to simulate, first, the rotation of adjacent O octahedron and 
distorted Biand Fe sites along the [111] direction; and second, the G-type 
antiferromagnetic exchange coupling between Fe atoms. We sample 
the Brillouin zone by adopting the [-centred Monkhorst-Pack method*" 
witha density of about 2m x 0.03 A“ inall calculations”. Brillouin zone 
integrations are performed with a Gaussian broadening of 0.05 eV 
during all calculations. The energy cut-off for the plane-wave expan- 
sion is 500 eV, which results in good convergence of the computed 
ground-state properties, according to our previous investigations of 
oxides, Structures are optimized with the criterion that the atomic 
force on each atom becomes weaker than 0.01 eV A“ and the energy 
convergence is better than 10° eV. 

To corroborate our experimental observations and to validate our 
conclusion that the observed image contrast results from local charge 
density, we derive the electron charge density of bulk SrTiO, and BiFeO;, 
projected along the (001) direction, through DFT calculations. It is well 
known that DFT allows the solving of one-electron Kohn-Sham equa- 
tions and the evaluation of electron density by using a lattice potential 
acting on the system’s electrons“. To appreciate the positive-charge 
effect from the nucleus in experiments, a Gaussian distribution of 
effective core charge at different nucleus sites is constructed, with 
the conservation condition that the positive charge is equal to the nega- 
tive charge (the total number of electrons) inthe system. The Gaussian 
broadening widths for Sr, Ti, Bi, Fe and O are 0.40 A, 0.31A, 0.50 A, 
0.33 Aand 0.30 A, respectively”. The DFT-calculated charge-density 
image is then derived by summing the electron charge density of the 
ground states and the Gaussian-distributed positive-charge density. 


Quantification of O octahedron rotation 

In RSCDI images, O can be seen from the positive-charge intensity. 
Without O octahedron rotation, in SrTiO,, the O atoms in the face cen- 
tres overlap along the projection of (100); with rotation, in BiFeO;, the 
O atoms in face centres split along the projection. Such splitting leads 
to weakening and elongation of the O charge intensity. 

In Extended Data Fig. 6, we plot the intensity and width from O col- 
umns along with the rotation angle ina charge-density image from DFT. 
In general, with higher octahedron rotation, the O intensity elongates 
and the peak intensity decreases. From one to seven degrees of rota- 
tion, no obvious change in the width of the oxygen intensity is seen. 
However, the peak intensity is more sensitive to octahedron rotation 
inthis range, as shown by the blue points and linear fit in the plot. The 
linear fitting between O intensity and octahedron rotation is used to 
quantify the O rotation in Fig. 4e; the fitting shows that the deviations 
of all points are less than one degree from the linear curve, so the error 
bars for measured octahedron rotation are chosen to be one degree. 


Measurement of total charge 

The total charge from individual atomic columns is measured by inte- 
grating the intensity within a region around each atomic columnin the 
charge-density image. To define the centre of all heavy atomic columns, 
we use the peak position of atoms in the HAADF image calculated by 
2D Gaussian fitting. To determine the size of the inclusion area, we 
apply a method from DFT for calculating the charge state of atoms, 
Bader charge analysis**“’, which uses the saddle points of the charge- 
density contour between atoms to define the boundary for electrons 
belonging to different atoms. As shown in Extended Data Fig. 7, in our 
images, we look for the local minimum in the charge density and use 


this point as the boundary in calculating the charge state from the 
charge-density image. 

In our measurements of total charge for each atomic column, note 
that in SrTiO,, along the (100) projection, Ti and O atoms overlap, so 
the integrated charge is from Sr, Oand Ti+ Ocolumns. In BiFeO;, Fe and 
O overlap, so that the measured charge is from Bi and Fe + O columns. 
We do not consider the O columns in BiFeO, because these columns 
split. For each type of atomic column, the intensity is measured and 
shown inthe histograms in Extended Data Fig. 8. From the fitted peaks, 
Bi intensity centres on 9.58, Sr on 5.71, Ti+ Oo0n 4.45, Fe+Oon2.51 
and O on-1.50. As the measured intensity resembles the total charge, 
including positive nuclei and core orbital electrons, the charge intensity 
reflects the valence of each type of atomic column. 

To estimate the partial charge of the atoms, we carry out a Bader 
charge calculation*®”’ for bulk SrTiO, and BiFeO;. For bulk SrTiO;, 
we find partial charges of 8.421e, 1.445e and 7.378e for Sr, Tiand O 
atoms, respectively (Extended Data Fig. 8b). Because there are respec- 
tively 10, 4 and 6 valence electrons for Sr (4s? 4p° 5s”), Ti (3d? 4s”) and 
O (2s? 2p*) atoms, their corresponding valence states are Sr“, Ti? 
and O7®, Similarly, in bulk BiFeO;, we find 2.131e, 6.119e and 7.584e 
for Bi, Fe and O atoms, respectively. Considering that there are respec- 
tively 5, 8 and 6 valence electrons for Bi (6s? 6p’), Fe (3d° 4s”) and 
O (2s? 2p*) atoms, their corresponding valence states are Bi*”*”, Fe"!®8 
ando}s, 

We note that Bader charge calculation uses saddle points (or ‘zero 
flux surfaces’) at which the charge density is a minimum in order to 
separate atoms from each other**”’. Owing to the covalent bond nature 
of the SrTiO, and BiFeO, systems, the charge states obtained through 
Bader charge analysis are usually underestimated (which is not fully 
consistent with the definition of ‘valence’ in chemistry) compared with 
the experimental results obtained through EELS. 

In stoichiometric phase BiFeO,, the valence states of Bi, Fe and O 
are3*,3° and2,, respectively, which gives the Fe +O column a valence 
state of 1* when projected along (100). In SrTiO,, the valence states of 
Sr, Tiand O are 2°, 4° and 2’, respectively, making the projected Ti+ O 
column 2*. If we order each atomic column according to its valence 
state (Bi, 3°; Sr, 2*, Ti+ O, 2°; Fe +O, 1*; and O, 2), we can see that this 
matches the ordering of the integrated intensities found inthe charge- 
density image. However, as discussed above, the partial charge derived 
from charge density—both experimentally and through DFT—could 
be different from the definition of ‘valence’ in chemistry. Therefore, 
we compare the total charge intensity with the partial charge derived 
through DFT. Now, Bi is 2.87’, Sr is 1.58*, Ti + O is 1.17’, Fe + O is 0.3° 
and Ois 1.38 (Extended Data Fig. 9a). All charges are well fitted with 
a linear curve plotting the charge intensity against the partial charge 
from DFT. We then use this linear relationship to estimate the charge 
of Ti+ O and Fe+Ocolumns across the interface in Fig. 4g (where the 
error bar denotes the standard deviation from all atomic columns ina 
row parallel to the interface). We emphasize that the charge variation 
measured using RSCDI assumes a uniform sample thickness, which 
can be determined by PACBED. 

In Extended Data Fig. 9a, the errors may derive from uncertain- 
ties in experiments and analysis, including the Poisson noise from 
the pixellated detector, sample drift and scanning noise in STEM, the 
distribution of charge intensities of each species, and uncertainties in 
determining the area used to calculate the integrated charge. 

By comparing experimentally measured partial charges with those 
from DFT calculations, we show that quantification of partial charge 
using RSCDI can be reliable. Relative changes in atomic charge can be 
easily observed by comparing the partial charge of atomic columns, 
when the areais uniform in thickness. Fully quantitative measurement 
depends on the fitting curve, which can be achieved in two ways: first, by 
precisely measuring the sample thickness, calibrating the instrument, 
and using a standard sample asa reference; and second, by comparison 
with DFT calculations. 


Wealso note that, when measuring the total charge, RSCDI will work 
best when the sample is along a zone axis in which each atomic column 
comprises only one atom species, so that atoms with different charge 
states will not overlap. The projection of 3D charge to two dimensions 
can also result in underestimation of those bond charges that are dis- 
tributed among atoms at different heights. This effect needs further 
study. 


Electron energy loss spectroscopy 

High spatial resolution STEM-EELS experiments are carried out on 
a Nion UltraSTEM 200, equipped with a C,/C, corrector and a high- 
energy-resolution monochromated EELS system (HERMES). The instru- 
ment is operated at 60 kV with a convergence semi-angle of 30 mrad 
and a beam current of about 100 pA. For spectrum image acquisition, 
we use a dispersion of 0.19 eV per channel, a dwell time of 0.4 per pixel 
and apixel size of 0.1nm. The background in each spectrum is removed 
by fitting a power-law function to the pre-edge region using the com- 
mercial software package DigitalMicrograph. To reveal the valence state 
of Ti, we use the multiple least squares (MLS) methods to separate the 
Ti* and Ti** components of each Ti EEL spectrum using the following 
equation: S(E) = a,R,(E) + a,R,(E) + x(E), where E is the energy loss; S(E) 
is the experimental EEL spectrum; R,(£) and R,(F) are standard Ti* and 
Ti** spectra’; a, and a, are fit coefficients (spectral weight) of Ti** and 
Ti** components; and y(E) is the residual spectrum. Fitting process is 
carried out using the commercial software Digital Micrograph. The 
valence state of Tiis further calculated from the weighted arithmetic 
mean of the Ti* and Ti** components (Fig. 4h). To analyse the valence 
state of Fe, we measure the difference between the onset energies of 
the O K-edge and Fe L,-edge, which were determined by the energy 
loss when the edge reaches 10% of its maximum intensity. Owing to the 
linear relationship between this Fe—-O onset energy difference and the 
valence state of Fe””*°, we can further calculate the localized valence 
state of Fe (Fig. 4h; the error bar denotes the standard deviation from 
all points collected within the atomic columns). 

Thereisa slight discrepancy between the charge states measured by 
EELS and by charge-density imaging: the charge in SrTiO, drops earlier 
in Fig. 4g and later in Fig. 4h. This discrepancy derives from the delo- 
calization of the EEL spectrum, such that the signal can be influenced 
by adjacent regions. As measured in EELS, the slightly higher valence 
state of Ticlose to the interface is the result of the valence of Ti from the 
bulk part of the SrTiO, being delocalized and detected by the probe at 
the interface. However, charge-density imaging is a more local probe: 
itis affected only by the field within its interaction volume, and we only 
integrate the charge density close to the atomic column. Therefore, the 
local charge measured by charge-density imaging more clearly shows 
achange inthe charge state of individual atomic columns, rather than 
in the material as a whole, which is measured with EELS. 


Data availability 


The datasets generated or analysed here are available from the cor- 
responding authors on reasonable request. 
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Extended Data Fig. 1| Measured electric-field strength in SrTiO, films of 
different thicknesses. The electric-field strength at locations close to Sr 
atoms (black), farther away (red) and farthest away, in between the Sr and O 
atomic columns (blue), was calculated from simulated diffraction data with 
different sample thicknesses up to 6 nm. The measured electric-field strength 


is shownas points, and the dashed lines denote linear fitting. The inset shows 
the sampling locations for each line on a map of the simulated electric field. 
Diffraction data were generated using multi-slice simulations in which 
conditions were matched to experimental conditions. 
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Extended Data Fig. 2| Measuring SrTiO, thickness using PACBED. a, HAADF- acquired from the boxed regionin the STEM image, and simulated PACBEDs of 
STEM image of SrTiO,. b, Least-squares fitting of the experimental PACBED SrTiO, with thicknesses from0.8nmto10.4nm. 
results with the simulated PACBEDs (red line). The inset shows the PACBED 
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Extended Data Fig. 3 | Measuring BiFeO, thickness using PACBED. Shown are the PACBED acquired in experiment and simulated PACBEDs for BiFeO, with 
thicknesses of 2-10 nm. 
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Extended Data Fig. 4 | Separation of positive and negative charge ina BiFeO, unit cell. a—c, Negative charge (a); positive charge (b); and overlapping of positive 
and negative charge (c) inthe pseudo-cubic unit cell of BiFeO;. d, Positions of positive (blue) and negative (red) charge centres. 
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Extended Data Fig. 5 | Atomic-resolution EDS maps across the BiFeO,/SrTiO, time) inthe same area across the interface were aligned and summed. The 
interface. The EDS map was acquired using aJEM300CF AC-STEM system with HAADF-STEM image and atomic-resolution EDS maps of Bi, Fe, Sr and Tireveal 
EDS dual silicon-drift detectors (SDDs). Thirty scans (each with a 0.4-ms dwell an atomically sharp interface. 
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Extended Data Fig. 6 |Measurement of O octahedron rotation. a, Atomic 
model of the BiFeO,/SrTiO, interface, which is relaxed and then calculated by 
DFT. The rotation of O octahedrais readily visible from the splitting of theO 
atoms in this projection. b, Charge-density image calculated using DFT. The 
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images of O charge become elongated and weak with higher O octahedron 
rotation. c, Intensity of O column charge (blue) and width of O intensity (red) 
plotted against O octahedron rotation measured using the atomic model from 
DFT calculations. 
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Extended Data Fig. 7 | Determination of the region for measuring the total as shown ina. Local minima in the charge-intensity profile are defined as the 
charge of atomic columns. a, 2D charge-density image of SrTiO . b, Charge- boundary of the area included for integrating the charge. 
intensity profile drawn along the horizontal (red) and vertical (blue) directions 
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Extended Data Fig. 8 | Measurement of the total charge of each atomic site. Histograms showing the integrated intensity of Bicolumns, Fe + Ocolumns,O 
columns, Sr columns and Ti+O columns from charge-density images of BiFeO, and SrTiO3. 
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Extended Data Fig. 9 | Charge-intensity change as a function of valence. The red line is the linear fit. b, Partial charge and valence states of all atoms 
a, Integrated intensity in each atomic column in the charge-density image, derived through Bader charge analysis in DFT. 


plotted asa function of valence derived through DFT to show their correlation. 
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Extended Data Fig. 10 | High-resolution core-loss EELS measurement of Ti, 
Oand Feat the SrTiO,/BiFeO, interface. a, HAADF-STEM image used for 
acquiring EELS data onthe SrTiO,/BiFeO, interface. Scale bar, 1nm.b-d, 
Stacking EEL spectra of the TiL, ;-edge (b); O K-edge (c); and Fe L, ;-edge (d) 
across the interface. The location of each coloured spectrum is marked by 


530 540 550 560 570 580 700 710 720 730 740 


Energy Loss (eV) Energy Loss (eV) 


the colour bar ina. Each spectrum is averaged in the direction parallel 
with the SrTiO,/BiFeO, interface. The purple, yellow and maroon arrows 


indicate respectively the top edge, interface and bottom edge of the mapping 
regionina. 
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The exchange of volatile species—water, carbon dioxide, nitrogen and halogens— 
between the mantle and the surface of the Earth has been a key driver of 
environmental changes throughout Earth’s history. Degassing of the mantle requires 
partial melting and is therefore linked to mantle convection, whose regime and vigour 
in the Earth’s distant past remain poorly constrained’. Here we present direct 
geochemical constraints on the flux of volatiles from the mantle. Atmospheric xenon 
has amonoisotopic excess of ”’Xe, produced by the decay of extinct ”°I. This excess 
was mainly acquired during Earth’s formation and early evolution’, but mantle 
degassing has also contributed ”’Xe to the atmosphere through geological time. 
Atmospheric xenon trapped in samples from the Archaean eon shows a slight 
depletion of ’Xe relative to the modern composition**, which tends to disappear in 
more recent samples”. To reconcile this deficit in the Archaean atmosphere by mantle 
degassing would require the degassing rate of Earth at the end of the Archaean to be at 
least one order of magnitude higher than today. We demonstrate that such an intense 


activity could not have occurred within a plate tectonics regime. The most likely 
scenario is a relatively short (about 300 million years) burst of mantle activity at the 
end of the Archaean (around 2.5 billion years ago). This lends credence to models 
advocating a magmatic origin for drastic environmental changes during the 
Neoarchaean era, suchas the Great Oxidation Event. 


The terrestrial atmosphere contains a ”’Xe monoisotopic excess of 
7.3% relative to primordial (solar or meteoritic) xenon, attributed’ to 
the decay of the extinct radioisotope “71. Some ”’Xe may also have been 
inherited from comets during the early stages of Earth’s accretion’. 
Atmospheric xenon evolved subsequently through mass-dependent 
fractionation (MDF) due to selective atmospheric escape*>*”, while 
preserving the mass-independent, monoisotopic excess of ”’Xe. Degas- 
sing of mantle xenon through volcanism contributed further ”’Xe to 
the atmospheric inventory, because mantle xenon is enriched in ”’Xe. 
(Values of ”°Xe/°Xe up to 7.0 are found for mantle plumes, and up to 
7.8 for mid-ocean ridge basalt (MORB) mantle source!, relative to the 
present-day atmospheric Xe signature’, ”’Xe/”°Xe = 6.496). Remnants 
of ancient atmospheric gases have been identified in fluid inclusions 
hosted in Archaean hydrothermal quartz***” and trapped in organic 
matter isolated from Archaean chert’, all from Australia and South 
Africa. Whereas nitrogen, neon, argon (**Ar, Ar) and krypton have 
isotopic compositions indistinguishable from the modern atmospheric 
values***”?, xenon isotopes are subject to MDF toan extent that is inter- 
mediate between the composition of the atmospheric Xe ancestor 
(labelled U-Xe) and the modern composition. The extent of isotope 
fractionation increased with time to reach the modern Xe composition® 
around 2 billion years ago (Ga). Together with the under-abundance of 


Xeinair relative to the expected abundance pattern of chondritic noble 
gases, this evolution has been attributed to selective Xe escape from 
the atmosphere to space via a non-thermal escape process related to 
interactions between the upper atmosphere’s atoms and ultraviolet 
photons from the young Sun*”. 

Samples with ages between 3.3 Ga and 2.7 Ga present comparable 
depletions of ”’Xe relative to adjacent °Xe and “°Xe isotopes irre- 
spective of their sampling location, whereas more recent samples 
have compositions consistent with that of modern atmospheric Xe 
(Figs. land 2; Extended Data Table 1). The deficit of ”’Xe in the Archaean 
atmosphere (denoted ”°Xep;-) was compensated over time by degas- 
sing of ”’Xe-rich mantle xenon (labelled ”’Xe,,, where suffix XS refers 
to ”°Xe in excess of the atmospheric composition: ?°Xey. = @°XE nan. 
eX {C7°Xe/ Xe) mantie~ (7?Xe/?°Xe) stmt). We define A”*Xe as the deviation 
of the sample ”’Xe/°Xe from the value expected for fractionated mod- 
ern atmospheric xenon, in parts per thousand (%o) (Fig. 1). All available 
data* °* define a clear evolution from negative A”’Xe values around 
3 Ga trending towards a modern-like composition starting around 2 Ga 
(Figs. 2and 3). This evolution suggests that the flux of ”°Xe,, from the 
mantle has varied considerably over time. 

The amount of ”’Xe,,; 3.3-2.7 Ga ago was (2.56 + 1.02) x 10° mol (95% 
confidence intervals, Cl), computed with an error-weighted average 
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Fig. 1| Principle of xenon isotope evolution over time. Data (filled symbols, 
error bars are 10) for fluid inclusions in 3.3-Ga Barberton (South Africa) 
hydrothermal quartz® exemplify the Xe isotope composition of Archaean air, 
normalized to the composition of modern air (horizontal black line). The Xe 
isotopic composition of the Archaean atmosphere is mass-dependently 
fractionated, being enriched in light isotopes relative to heavy ones. The 
measured ”°Xe/°Xe ratio is, however, below the isotope fractionation line 
defined by the other Xe isotopes. This depletion in”°Xe/°Xe, denoted A”°Xe, 
is defined as the distance between the observed value and that expected for 
isotope fractionation of modern air (opensymbol). The dotted arrows indicate 
the evolution of Xe isotope fractionation through time, yielding the modern 
composition around 2Ga. The light green shading around the isotope 
fractionation line represents the 2 error of the error-weighted linear 
correlation through Xe isotope data, excluding Xe. 


A”°Xe of (-6.3 + 2.5)%o (Barberton, MGTKS3#2 and Fortescue samples, 
Extended Data Table 1; 95% Cl)). The fractionation of Xe isotopes in 
the ancient atmosphere strongly suggests*>**8 that a large fraction 
of xenon was lost to space between 3 Ga and 2 Ga. Considering either 
an exponential law’ or a power law‘ for Xe escape results in essentially 
the same loss of atmospheric xenon, equivalent to 2.5 + 0.5 times the 
modern atmospheric Xe inventory (Extended Data Fig. 1). Byincorpo- 
rating the simultaneous loss of atmospheric xenon to space during 
the Archaean, the total amount of ”°Xe,,, could have been as high as 
(8.96 + 3.57) x 10"° mol (Extended Data Table 2). 

Delivery of cometary Xe is an unlikely process to account for the 
temporal evolution of atmospheric A”’Xe (Methods), and we consider 
volcanic degassing of ”’Xe,, as the main source of A”’Xe variation. Con- 
trary to the case of radiogenic *°Ar, which was degassed from both the 
mantle and the continental crust through time“, a mantle-only origin 
for radiogenic ”°Xe is certain, so that accumulation in the atmosphere 
directly traces time-dependent mantle degassing and convection. Since 
about 3 Ga, the average flux of ”°Xe,, from the mantle to the atmos- 
phere necessary to compensate for ”°Xep,, in the Archaean atmos- 
phere is equivalent to 8.5 + 3.4 mol yr“ (closed system atmosphere’), 
or 30 +12 mol yr“ (taking into account xenon lost to space) (95% Cl). 
We estimate the modern flux of ”’Xe,, independently by scaling the Xe/ 
He ratio measured within mantle-derived samples to the *He mantle 
flux in the oceans from submarine volcanism”, and from subaerial 
volcanoes” to be 0.9 + 0.5 mol yr“ (Methods; Extended Data Table 2). 
Thus modern degassing rate averaged over 3 Ga would fail by one order 
of magnitude to supply the amount of ”°Xe,,, that was missing in the 
Archaean atmosphere. 

We modelled the evolution of atmospheric Xe through time by MDF* 
with different functions estimating the evolution of A”°Xe (Fig. 2A-D; 
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Methods). The model considers both cases of a closed system atmos- 
phere and progressive escape to space, with, in the latter case, the 
amount of lost Xe (2.5 times the modern Xe inventory in total) being 
scaled to the isotopic evolution of atmospheric Xe (Methods, Extended 
Data Fig. 1). This model is iterative, combining progressive loss and 
MDF fractionation of atmospheric xenon until 2.0 Ga with the time- 
dependent evolution of A”’Xe (Extended Data Fig. 1). The model 
requires the ”°Xe/*°Xe ratio of the ancient mantle to be estimated. 
Because the production of substantial amounts of radiogenic ”’Xe 
would have occurred only during the first 100 million years (Myr) of 
Earth’s history given the half-life of parent "I (15.7 Myr), the ”°Xe/°Xe 
ratio could have only evolved by subduction/recycling of ‘modern-like’ 
atmospheric xenon into the mantle®””, From mass balance, we esti- 
mate that the pre-subduction ””’Xe/”Xe ratio of the mantle was in the 
range 14 + 1 (Methods; Extended Data Fig. 2). Although the recycling 
history of atmospheric Xe into the mantle between 3 Gaand 1 Gais not 
known, numerical modelling of Xe evolution in the mantle-atmosphere 
system suggests that the imprint of recycling became quantitatively 
important only from 1 Ga (Extended Data Fig. 3, ref. ”). To circumvent 
this uncertainty, we modelled the evolution of A”°Xe in the time interval 
3-1Ga. The evolution of A”’Xe is then modelled assuming that mantle 
degassing decreased continuously (using exponential and power laws) 
since the Archaean (Fig. 2A—-C). Both exponential and power laws give 
similar outcomes for the flux of mantle-derived’”’Xe to the 3-Ga atmos- 
phere, 18 mol yr and 63 mol yr“, respectively, without considering 
escape, and 64 mol yr‘and 220 mol yr“, respectively, if loss to space 
is taken into account. 

However, near-constant A”°Xe in the range 3.3-2.7 Ga followed by 
a stepwise change to the modern value around 2.6-2.0 Ga (Fig. 2d) 
strongly suggest that the Neoarchaean was punctuated bya short burst 
of intense magmatic activity, consistent with the evolution of mantle 
potential temperatures through time’ (Fig. 3). For sucha model, consid- 
ering a distinct period of intense degassing between 2.6 Ga and 2.2Ga, 
we calculate a peak degassing rate of 141 mol yr“ for escape to space 
(Extended Data Table 4). For comparison, we estimate the modern 
flux of mantle-derived ”°Xe to be 6.3 + 2.6 mol yr‘ (computed witha 
B0Xe flux of 0.85 + 0.35 mol yr“, Extended Data Table 2, and an average 
mantle ”’Xe/°xe ratio of 7.4 + 0.4). Our estimates rely on the mantle 
2°Xe/°Xe ratio, which is taken here to be maximal at 14 (corresponding 
toa pre-subduction signature; Methods). If this ratio were to be lower 
in the ancient mantle owing to an early onset of subduction (down 
to potential modern values of 7-8), then our estimates of mantle Xe 
fluxes would be increased substantially, by up to one order of magni- 
tude (Extended Data Fig. 6). We therefore consider our degassing rate 
estimates reported above to represent lower limits. Irrespective of the 
model chosen, it is therefore clear that substantially higher mantle 
fluxes are required in the Archaean. Enhanced degassing would have 
had a marginal effect on the atmospheric *°Ar/**Ar ratio (Methods; 
Extended Data Fig. 4) and would be difficult to detect in other Xe isotope 
ratios of Archaean samples (for example, the fissiogenic ones) because 
all Archaean samples contain an inherited or produced fissiogenic 
excess that is likely to mask the original atmospheric composition 
(Methods). 

Given theincompatibility of xenon during partial melting”, the rate of 
mantle degassing is related to mantle melting. However, higher concen- 
trations of Xe inthe Archaean mantle relative to its present-day budget 
could potentially lower the amount of mantle degassing necessary to 
account for the evolution of the ”°Xe deficit in the Archaean atmos- 
phere. Indeed, models investigating the time evolution of the *He/*He 
ratio of the mantle, for example, suggest mantle He concentrations 
to be higher during the Archaean”°. With our numerical simulations 
(Fig. 2), we estimate that 3%-39% of xenon could have been degassed 
from the mantle since the Archaean, irrespective of the model adopted 
(Extended Data Table 4). From estimates of the present-day Xe con- 
tent of the mantle calculated from the ’°Xe/*He ratio and the modern 
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Fig. 2| Time evolution of the deficit of ?°Xe (A”’Xe) in ancient atmospheric 
gases, of the atmospheric ”°Xe/"°Xe ratio, and of the flux of ”°Xe from the 
mantle (#”°Xe). Data and references are given in Extended Data Table 1. The 
data are modelled in four ways: using power (A) and exponential (B) laws fitted 
through all data points; using an exponential law fitted through an anchor 
point at 3 Ga, -6.3%o (C); and using aramp function used to mimic the effect ofa 


3He-degassing rate, we also find that the Archaean mantle Xe content 
could have been at best a factor of 2 higher than that of the modern 
mantle (Methods, Extended Data Table 2). Moderate mantle noble-gas 
depletion since the Archaean is independently indicated by Neisotope 
systematics of the mantle-atmosphere system (Methods). We conclude 
that higher noble-gas concentrations in the ancient mantle cannot 
account for the >10 times greater Xe mantle fluxes in the distant past, 
thus calling for enhanced magma production rates inthe Neoarchaean. 

We evaluate here which scenario—continuous decrease in mantle 
degassing with time or a short-lived burst of activity in the 2.6-2.2 Ga 
time period—is most likely within the framework of past mantle dynam- 
ics. The former model represents the secular waning of melt genera- 
tion at mid-ocean ridges and can in principle be rejected, as we now 
explain. Inthe Neoarchaean, ambient mantle temperatures were higher 
than today, implying larger melt fractions!®”"”, but melt production 
rates depend on plate velocities, which are poorly constrained for that 
time. Extending plate tectonics models far back in time is fraught with 
severe uncertainties’®””*. The global rate of plate renewal, however, 
is directly related to the Earth’s heat loss, which can be deduced from 
changes of the ambient mantle temperature through time. Going back 
in time, this temperature increases and peaks at about 1,600 °C at an 
age in the 2.5-3.0 Ga range’’ (Fig. 3). Data for greater ages up to 3.5 Ga 
do not indicate any further temperature variation’®. Thus, by defini- 
tion, the mantle cooling rate was effectively zero at about 2.5 Ga. The 
global heat balance for the Earth then dictates that heat loss was equal 
to heat production. Given that heat production contributes about half 
of today’s heat loss and that it was twice as large at 2.5 Ga, heat loss was 
about equal to its present-day value at that time. Using a well-tested 
model for the thermal evolution of oceanic plates, we relate the rate of 
melt generation to heat loss and melt thickness at spreading centres 
(Methods). We show that the rate of melt production cannot have been 
more than about five times higher than today, far below values required 
by the ”°Xe data. We thus argue in favour of a relatively short burst 
of mantle activity around the Archaean-Proterozoic boundary, for 
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massive, discrete episode of degassing 2.6-2.2 Ga (D). For each of the four 
models A-D are shown three plots: a, A’’Xe versus time (data points), witha 
curve fitted to the data; b, ”’Xe/"°Xe versus time; and c, @’”’Xe versus time. In 
the plots of A”°Xe, the curves and error areas (A,a, B,a; 95% confidence 
interval) were produced using the error-weighted solver function of the Matlab 
curve-fitting tool. 


example, in the 2.6-2.2 Ga time period (Fig. 2d). As shown in the Meth- 
ods, such intense activity would necessarily be associated witha large 
heat loss and would induce a dip in ambient mantle temperature if it 
were long-lived. The mantle ambient temperature data do not support 
this, and hence provide further support for a short phase of anomalous 
melt generation. 

The large temperatures that prevailed in the late Archaean imply that 
large melt fractions occurred in mantle upwellings, probably affecting 
the global rheological behaviour of the mantle. It has been proposed 
that the mantle may have experienced brief ‘mush ocean’ episodes” that 
punctuated longer periods of sluggish plate tectonics”. The increased 
overturn rate of cooled material during ‘mush oceans’ would have 
resulted in enhanced degassing. It would also have led to larger rates 
of heat loss and hence rapid cooling. Thus, this anomalous convec- 
tion regime was self-defeating and could not have been maintained 
for long. The progressive cooling of the mantle from 2.5 Ga onwards 
then enabled astable plate tectonic regime” and steady-state mantle 
degassing (Fig. 3). 

Several independent geological observations support the operation 
of apeculiar mantle convection regime in the late Archaean and early 
Proterozoic” *°. For example, the Superior craton saw the repeated 
accretion of large individual volcanic belts and older terrains at its 
southern margin in at least five independent events, over very short 
time intervals of about 10 Myr between” 2.70 Ga and 2.65 Ga. Following 
craton assembly, the very voluminous Matachewan dyke swarm testi- 
fies to enhanced magmatic activity and large eruption rates at 2.45 Ga, 
which are not well accounted for by plate tectonics”’. Furthermore, the 
fact the ”°Xe/°Xe of the atmosphere has not changed since 2.2 Ga, 
despite continued large-scale magmatism, indicates that (i) the con- 
centration of Xe in the mantle was lowered during intense degassing 
periods, and/or that (ii) the initiation of subduction-driven transfer 
of atmospheric Xe to the mantle during the Archaean-Proterozoic 
diminished the difference in ”’Xe/*°Xe between the two reservoirs 
(Methods). 
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Fig. 3 | Time evolution of the deficit of ”°Xe (A”°Xe) in ancient atmospheric 
gases compared to petrological estimates of mantle potential temperature 
(T,) for non-arc lavas. The data for 7, (light blue diamonds) are taken from 
ref.'8, The light grey curve represents the fit through the 7, data, and the light 
blue shaded area exemplifies the evolution of 7, through time. The filled red 
circles are the A!”*Xe values as defined in Fig. 1 (error bars, 10) andinthe main 
text, and given in Extended Data Table1. 


The period during which Xe was most efficiently degassed from the 
mantle to the atmosphere (2.6-2.2 Ga) occurred at a time when the 
Earth was undergoing fundamental environmental changes, including 
the Great Oxidation Event”®. An intense period of mantle degassing at 
that time may therefore have been essential in promoting the transi- 
tion towards modern Earth-like conditions, which was required for 
the development of life”. Unlike Xe, other volatile elements (water, 
carbon and nitrogen species) only behave as incompatible elements 
during redox conditions akin to those of the modern mantle. The near 
invariance of redox-sensitive elements like vanadium indicates that the 
redox state of the mantle and associated basalts has remained nearly 
constant since the middle Archaean to the present”’. Thus volcanic 
H,O, CO,, N, and SO, (the main volcanic gas species at low pressures”) 
would have been released into the Archaean atmosphere at rates com- 
parable to that of ”°Xep,,. The Archaean volcanic flux of CO,, which 
is of the order of 6 x 10” mol yr‘ at present”, would have been in the 
range 10-10" mol yr‘, comparable to the anthropogenic flux of CO, 
(7 x10" mol yr). Such high volcanic gas fluxes could have had tre- 
mendous impact on the Archaean environment, providing enormous 
quantities of CO, and SO, and possibly triggering the A**S peak at that 
epoch”. Enhanced CO, (and associated N,) fluxes could have played a 
major part in the thermal budget of the Earth’s surface*, by lowering 
the partial pressure of atmospheric’ N, and by triggering the produc- 
tion of organic matter that ultimately led to the Great Oxidation Event. 
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Methods 


Deficit of ”’Xe in Archaean air relative to modern air, reservoirs 
and fluxes 

A”°Xe is the deviation of the sample ”’Xe/"°Xe ratio from the modern 
atmospheric’”’Xe/”°Xe ratio, in %o (Fig. 1). Modern atmospheric xenon 
is mass-dependently fractionated relative to ancient atmosphere*>*”. 
Because atmospheric ”°Xe is contributed by a monoisotopic nuclear 
effect (the decay of ”°1), its variation can be identified from mass- 
dependentisotopic fractionation by comparison to the adjacent stable 
Xe isotopes. Following Pujol et al.’, the isotope ratios are normalized to 
the modern Xeisotope composition, and the slopes of the fractionation 
trends (as well as the original data) are listed in refs. *°*° (original Xe data 
are reported in https://zenodo.org/record/3378722#.Xa6cay3pNVF). 

A”°Xeis the distance between the measured 5”’Xe value (green dot, 
Fig. 1) and the equivalent value sitting on the fractionation line at mass 
129 (white dot, Fig. 1). Values of A”°Xe different from O are identified for 
three samples having ages around 3 Ga (Extended Data Table 1). Other 
samples have A”°Xe values that are not statistically different from the 
modern atmosphere composition. Most of these samples were analysed 
only once and as such the resulting errors are comparatively large. For 
the three samples above, we computed a mean error-weighted A”’Xe 
value of (—6.3 + 2.5)%o (95% Cl) for the period 3.3-2.7 Ga. 

The amount of ”°Xe that was missing in the Archaean atmosphere, 
denoted ”’Xep,,, is computed from the mean Archaean A”°Xe value, 
the modern Xe inventory of the atmosphere, and the isotopic compo- 
sition of modern atmospheric Xe (Extended Data Table 2). Taking the 
Archaean atmospheric ratio of 6.455 (obtained by subtracting 6.3% 
from the modern”Xe/°Xxe ratio, after correction for mass-dependent 
isotopic fractionation) instead of the modern value’ of 6.496 would 
make a negligible difference compared to uncertainties in A”’Xe val- 
ues. We also considered a non-conservative atmosphere from which 
2.5 times the modern Xe inventory is lost to space, as suggested by the 
temporal evolution of Xe MDF (Extended Data Fig. 1). ’°Xe degassed 
fromthe mantle that changes the atmospheric Xe isotopic composition 
is labelled ”°Xe,,. The average yearly flux of ”’Xe,, was simply computed 
by dividing ”’Xe,,, by 3 x 10° yr (Extended Data Table 2). 

The modern mantle flux of ”’Xe,<, p'”’Xeys, is computed as: 


129 53 130: 3 
Q Xexs =P He mantle * ( Xe/ He) mantie* 


[(?°Xe/?°Xe) mantle & (?°Xe/Xe) tml 


(1) 


with (°Xe/°Xe) am = 6.496 (ref. ’). The fluxes were computed for two 
different mantle sources, namely mid-ocean ridge basalt (MORB; 
2°Xe/°Xe=7.8, ref. "), and mantle plume (’Xe/°Xe = 7.0, ref. °). The 
resulting global value, 0.89 + 0.47 mol yr, encompasses both estimates 
within uncertainties (Extended Data Table 2). Note that this value is an 
upper limit since it assumes a end-member ratio for mantle ”’Xe/°Xxe. 


Cometary contribution 

We test here the possibility that the ”°Xe,,, was compensated by the 
delivery of cometary Xe. The analysis of volatiles released by comet 67P/ 
Churyumov-Gerasimenko (67P/C-G) suggests’ that comets are rich 
in xenon and particularly in ”’Xe. A large amount of ”°Xe could have 
been delivered by cometary impacts in the time interval 3.0-2.0 Ga. 
Assuming that the xenon data from 67P/C-G?? (?°Xe/°Xe = 7-8, Xe/ 
H,O =2.4 x 10°’, H,O concentration ~20 wt%, density 0.55 g cm”) are 
representative of the cometary reservoir, a single comet with a diameter 
of ~260 km impacting the Earth could have delivered the amount of 
?°Xe missing in the Archaean atmosphere. For comparison, the impac- 
tor that made the 2.02-Ga-old Vredefort impact structure (the second 
largest one preserved on Earth) might have been much smaller, around 
10-20 kmindiameter™. Several cometary impacts would have resulted 
inasimilar effect, without leaving scars on Earth ifthey occurred inthe 
oceans, or if comets exploded in the upper atmosphere. However, this 


possibility is not consistent with the progressive isotope evolution of 
palaeo-atmospheric xenon, which is best accounted for by escape to 
space**®”, whereas addition of cometary Xe would have forced Archaean 
atmospheric Xe towards a primitive composition rather than a modern 
atmospheric one. We therefore consider the addition of cometary ”’Xe 
during the Archaean to be insignificant compared to the contribution 
from mantle degassing. 


Mantle degassing state 

We evaluate here how much mantle Xe should have been lost from 
the 3-Ga mantle through time in order to supply missing ”’Xe to the 
atmosphere since 3 Ga (?°Xep,r = 2.6 x 10” mol for a closed system 
atmosphere, and 9 x 10° mol in the case of atmospheric escape, 
Extended Data Table 2). We consider two mantle sources", MORB- 
like (?°Xe/°Xe = 7.8) and mantle-plume-like (’Xe/°Xe = 7.0). The 
mantle Xe contents are scaled to those of *He. The MORB source *He 
content is computed from the *He flux to the oceans and subaerial 
volcanoes“, the magma generation rate at ridges (21 km? yr?) andan 
average partial melting rate of 12% (ref. *°). The plume source content 
is derived from the difference in the helium isotope ratios and in the 
U, Thcontents between MORB and plume sources (ref.*, see ref. *° for 
comparable values). Two cases are considered, the modern mantle and 
the ancient, pre-subduction mantle. For the latter, we use ”’Xe/°Xe = 14 
(+1) whichis our estimate for pre-atmospheric contamination of mantle 
xenon (based on Xe isotope correlations for CO, well gases; compare 
ref. *’; Extended Data Fig. 2), and we correct the *°Xe/*He ratio for 80% 
atmospheric contribution, assumed to have taken place quantitatively 
in the last billion years (see Methods section ‘Numerical modelling’ 
below). We finally compute the lost fraction for each reservoir and 
for each scenario (Extended Data Table 3). In all scenarios, a MORB- 
type reservoir would have lost between 59% and 99.4% mantle Xe. A 
plume source reservoir would have lost between 3.5% and 64% Xe. A 
pure depleted MORB-type composition at 3 Ga is unlikely given the 
timing of continental crust growth (which was the primary cause of 
mantle depletion), anda modern-like Xe isotope composition might not 
have prevailed before 1 Ga. Hence it may be relevant to consider a pure 
ocean island basalt (OIB)-like, or mixed MORB-plume composition, with 
atmospheric Xe recycling taking place in the last billion years, yield- 
ing a moderate 3-Ga mantle degassing state of ~50% or less. Thus the 
Archaean mantle could have been richer by a factor of approximately 
<2 in xenon compared to the modern mantle. 

Higher concentrations of Xe in the Archaean mantle relative to its 
present-day budget could potentially lower the amount of mantle 
degassing necessary to account for the evolution of the ”°Xe deficit 
in the Archaean atmosphere. We have estimated above that the con- 
centration of xenon could have been <50% higher at 3 Ga based on mass 
balance of the mantle reservoir. However, models investigating the time 
evolution of the *He/*He of the mantle, for example, require mantle 
3He concentrations to be higher during the Archaean”’”. Therefore, 
we have attempted to define independently the maximum amount of 
Xe that could be present within the Archaean mantle by constraining 
it with Ne isotopes. 

Neon provides a useful tool for constraining the concentration of 
Xe indifferent reservoirs as is not efficiently recycled to the mantle (in 
contrast to Ar, Kr and Xe; ref. °°) and has been retained in the atmosphere 
throughout Earth’s history (in contrast to He). An additional problem 
with scaling our calculations against He would be that, unlike the case of 
Ne, the isotopic composition of the mantle end-member is not known. 
For Ne, if the mantle was enriched in the Archaean (3.3 Ga) relative to 
the modern day by a factor of 10-20, as has been suggested for *He, 
then the progressive degassing of mantle Ne to the atmosphere with 
time will result in a change in the isotopic ratio of the atmosphere, as 
there is a discernible difference in the Ne isotopes between the solar/ 
chondritic 7°Ne/“Ne ratio of the mantle (12.7-13.4, refs. °°“) and the 
atmosphere (9.80). However, as of yet, no Archaean aged samples have 
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shown deviations in Ne isotopes from the modern atmosphere’, indicat- 
ing limited contribution of the mantle Ne signature to the atmosphere 
since 3.3 Ga. 

We define the maximum possible Ne enrichment factor for the 
Archaean mantle that could still preserve the modern day atmos- 
pheric composition through time by using concentration weighted 
isotopic mixing calculations. First, we assume that any enrichment in 
Ne concentrations within the Archaean mantle relative to the present 
will be ultimately degassed and retained in the atmosphere. Thus, if 
the mantle was 10 times more enriched in Ne during the Archaean, 
the Archaean atmosphere must be depleted by the same amount. We 
take the minimum measured 7°Ne/”Ne ratio measured within 3.3-Ga 
quartz-hosted fluid inclusions to be that of the Archaean atmosphere 
(9.64 + 0.05; ref. >). The mantle ?°Ne/”Ne is defined as either having a 
solar (13.4; ref. ") or a chondritic-like (12.7; ref. *°) composition. We 
determine that to raise the *°Ne/“Ne of the Archaean atmosphere from 
9.59 to the modern value of 9.8 would require 1.60 x 10“ mol of mantle 
?0Ne to be degassed to the atmosphere assuming the mantle has solar 
?0Ne/”Ne, and 1.96 x 10" mol if the mantle has chondritic 7°Ne/“Ne. 
The amount of mantle neon degassed to the atmosphere since 3.3 Ga 
can be expressed as: 


20 — 20 _ 20 
Ne degassed = Nema Neaa 


[(?°Ne/??Ne)a4 - (7°Ne/”*Ne) al (2) 


= Neal 20n) /22 20n] 0/22 
[(Ne/*Ne)aq — (Ne/* Ne) mantie! 


where subscripts AA, MA, Aland mantle refer to Archaean atmosphere, 
modern atmosphere, atmospheric inventory and mantle, respectively. 
Readmitting the amount of degassed Ne back to the mantle would 
result in the mantle during the Archaean being enriched by a factor of 
1.2-3.8 times the present concentrations (mantle inventories obtained 
from end-member mantle”°Ne concentrations®**”, and a mantle mass 
of 4 x 10” g) assuming a solar mantle, and 1.2-4.4 if the mantle Ne is 
chondritic. The large range in these estimates is controlled primarily 
by the large uncertainty onthe concentration of Nein the present-day 
mantle**”, 


Potential impact of Archaean degassing on atmospheric noble 
gases 

Wetested the effect of intensive mantle degassing during the Archaean 
onthe evolution of the atmospheric *°Ar/**Ar ratio. We carried out a 
mass balance calculation based on the mantle noble-gas composition 
on one hand, and on the other, the amount of ”°Xep,, in the atmos- 
phere. We considered two mantle sources, MORB-like and plume- 
like, with noble-gas end-member compositions”. The respective 
40Ar/*°Ar ratios were computed at 3 Ga (correcting for radiogenic *°Ar 
produced afterwards), and we considered a pre-subduction, Archaean 
2°Xe/"°Xe ratio of 14 (Extended Data Fig. 2). Results suggest that the 
contribution of Archaean mantle degassing to the *°Ar atmospheric 
inventory was of the order of a few per cent (Extended Data Table 3). 
We tested the effect of 5% and 10% *°Ar inventory degassing during 
a sudden release of ”°Xe at 2.6-2.2 Ga, with a K-Ar box model similar 
to that used by Pujol et al.” that includes early degassing and crustal 
growth. The evolution curves are depicted in Extended Data Fig. 3. 
In principle, ajump of the Ar isotopic ratio around that period of 
time could be observable, but uncertainties related to the contribu- 
tion of *°Ar produced in situ in samples could mask such an effect. 
Thus we conclude that a massive Archaean mantle degassing event 
would not have drastically affected the radiogenic *°Ar budget of 
the atmosphere. 

During Archaean degassing, fissiogenic Xe isotopes were also 
released together with ”’Xe,, from the mantle to the atmosphere. 
In mantle-derived samples, ”°Xe/"*°Xe correlates with °Xe/°Xe 
with a slope of 3.0 for both MORB and plume sources!” as a result 


of contributions of radiogenic ”°Xe and fissiogenic °°Xe (°Xe,). An 
Archaean A”°Xe value of —6%o (Extended Data Table 1) would therefore 
correspond to a deficit of **Xe, of about —2%o in Archaean air. Sucha 
variation would be barely detectable in ancient samples. Archaean 
samples analysed so far**® present positive A’°xXe, values of +30%o 
(Barberton sample’) and higher’, with fission spectra consistent with 
production from *%U fission®. Thus any potential effect of mantle 
degassing is likely to be masked by the inheritance of fissiogenic Xe 
fromthe trapped crustal fluids and/or the in situ production from 7°U 
fission after emplacement of the rocks at the surface. This problem 
would prevent detection of any effect on the Archaean atmospheric 
composition of fissiogenic Xe. 


Numerical modelling 

We consider three scenarios around the evolution curve of the isotopic 
composition of atmospheric Xe (Extended Data Fig. 1), which has been 
modelled to follow a power law defined by y=0.238x** (ref. *). New data 
ontheisotopic composition of ancient atmosphere Xe from fluid inclu- 
sions in hydrothermal quartz have been recently published® that sup- 
port the validity of this evolution curve. Also reported in Extended Data 
Fig. lis the theoretical amount of extra ATM,, in the atmosphere scaled 
on Xeisotopic evolution, where ATM,, stands for the total inventory of 
Xeinthe present-day atmosphere. Over the lifetime of the atmosphere, 
~10 ATM,, would have been lost to space. 

In the first scenario, we consider that Xe degassing from the man- 
tle occurred after Xe loss to space ended. This implies that the A’”’Xe 
remained constant (at (-6.3 + 2.5)%o) from 3 Ga to 1 Ga, before A”’Xe 
was raised to 0%o solely through mantle degassing. In this case, mantle 
degassing takes place while the Xe isotope signature of the atmosphere 
is already modern-like, with no concomitant loss to space. 

Inthe second scenario, we consider that Xe degassing from the man- 
tle (with A”°Xe varying from (—6.3 + 2.5)%o to 0%) occurred at 3 Ga, 
when the atmosphere was mass dependently fractionated by -10%o u 
and had about 3.5 times the present-day inventory of atmospheric Xe 
(Extended Data Fig. 1, right hand y axis). In this case, given that Xe is 
more abundant in the atmosphere than in the first scenario, the total 
amount of mantle-derived ”’Xe required to fill the ”°Xe deficit (”’Xep,,) 
is also larger than in the first scenario. 

The first and second scenarios do not represent real world condi- 
tions, as they assume that degassing and loss did not occur simulta- 
neously, but they are useful in setting the boundary conditions to this 
model. In the third scenario, we produce an iterative model combining 
progressive loss and MDF of atmospheric Xe (Extended Data Fig. 1) 
with the evolution of A”°Xe (Fig. 2). To model the latter, we test three 
possibilities by fitting A’”’Xe data with either power law, exponential, 
power-law or ramp functions (Fig. 2A-D). At each step of the iteration 
(i), the atmosphere is allowed to evolve by both loss to space and MDF 
(Extended Data Fig. 4). The ”’Xe/"°Xe ratio is then computed by using 
both A”°Xe (i- 1) and A”’*Xe (i). The contribution of mantle-derived 
2°Xe (”°Xe,<) to the atmospheric budget of ”’Xe from step i-1to step 
iis then calculated given the equation: 


(?°xe/3Xe); és (?°Xe/Xe),_4 


M contrib ye = (?Xe/3°Xe),, = (29Xe/2°Xe),_, 


(3) 


where (°Xe/®°Xe), and (?°Xe/"°xXe),_, are the ”’Xe/°Xe of the atmos- 
phere at stepsiandi-1, respectively, and (?’Xe/"°Xe),, is the ”’Xe/"°Xe 
of the mantle source. The amount (in mol) of mantle-derived ”°Xe 
degassed into the atmosphere between step i- 1 and step iis then 
calculated as: 


Mw ,,=M p2y,* ATM 2256, (4) 


contri 


where ATM229,.,i8 the total amount (in mol) of ”°Xe in the atmosphere 
at step i given the evolution curve of Xe loss and ”°Xe/“°Xe computed 


from the evolution curve of atmospheric Xe isotopes and 8'”°Xe gej,i- 
However, determining the (’Xe/°Xe) of the mantle is not straight- 
forward, given that this ratio also evolved through time by ”°Xe pro- 
duction through radioactive decay of now extinct ”’I and recycling 
of atmospheric Xe into the solid Earth. Given that the half-life of 7°] 
is short (7,.=15.7 Myr), the whole budget of '’Xe* (that is, ”°Xe pro- 
duced by the decay of ”*I) should have been established early in Earth’s 
history, within the first ~100 Myr. The recycling of atmospheric Xe to 
the mantle is considered to be extensive, with the present-day inven- 
tory of Xe in the mantle dominated by 80%-90% recycled modern 
atmosphere”. Correcting the mantle ”’Xe/”°Xe (7.8) for the contri- 
bution of recycled atmosphere (80%-90%) would yield a ”°Xe/°Xe 
in the range of 13-17 for the primitive convective mantle. The initial 
2°Xe/?°Xe of the convective mantle can also be estimated from 
28Xe/°Xe versus ”°Xe/°Xe correlations in magmatic CO, well gases 
(ref. *?, Extended Data Fig. 2). The *8Xe/”°xXe of the initial mantle is 
taken as the chondritic value (?°Xe/°Xeaycc = 0.5073 + 0.0038, where 
suffix AVCC refers to Average Carbonaceous Chondrite, ref. ’). Extrap- 
olating the ’”’Xe/°Xe to ?8Xe/°Xe avec yields a ”?Xe/ XE initia Detween 
13 and 15, in good agreement with independent estimates from the 
fraction of recycled atmosphere in the mantle. Note 
that these estimates assume that the atmospheric Xe component in 
the mantle has a modern atmospheric composition. If Xe was exten- 
sively recycled to the mantle while the Xe composition of the atmos- 
phere was still evolving, then estimating the ”’Xe/”°Xe of the mantle 
during the Archaean becomes more complicated. However, Parai and 
Mujhopadhyay” proposed that substantial full-scale recycling of 
atmospheric xenon into the solid Earth could not have occurred 
before 2.5 Ga, given that (i) the isotopic composition of atmospheric 
Xe progressively evolved through time by MDF and reached the mod- 
ern composition around 2 Ga (Extended Data Fig. 1), and (ii) the 
Xe atmospheric component in the present-day mantle is indistinguish- 
able from modern atmosphere. Although the recycling history 
of atmospheric Xe into the mantle between 2.5 Ga and 1 Ga is not 
known, constraints on the amount of Xe being transported into the 
solid Earth over time through atmospheric recycling have been 
recently set via numerical modelling of Xe evolution in the mantle- 
atmosphere system (Extended Data Fig. 5, ref. ”). While some small 
scale recycling of atmospheric Xe to the mantle might have occurred 
before 2.5 Ga, it would have had a limited effect on the budget and 
isotopic composition of mantle Xe, and we therefore consider our 
estimates of mantle ’”°Xe/°Xe between 13 and 15 during the Archaean 
to be valid. 

The A”°Xe evolution curves are represented in Fig. 2. We also pro- 
vide the time evolution of the atmospheric ”’Xe/”°Xe ratio and the flux 
¢ of mantle-derived ”°Xe (y”’Xe) to the atmosphere. The ”’Xe/°Xe 
ratio might not vary monotonically because two independent 
processes (namely MDF of the atmosphere and mantle degassing) 
are causing this ratio to vary (decrease and increase through 
time, respectively). In the case of a short burst, the ”’Xe/”°xXe ratio 
would first decrease due to MDF, and increase during the burst (dur- 
ing which MDF is still ongoing but mantle degassing dominates), 
and then decrease again because of MDF. However, the A”’Xe would 
either remain stable during periods of limited mantle degassing, or 
increase towards O during period(s) of intense mantle degassing. The 
results of the different model versions are summarized in Extended 
Data Table 4. 


Rate of melt production by mantle convection 

For a physical model of the Earth’s secular thermal evolution, one 
needs an equation that relates heat loss to temperature. Many past 
efforts have been based on physical models of sea floor spreading 
that have been calibrated using present-day plate characteristics’**. 
These models must be tuned to account for changes of plate struc- 
ture, density and rigidity arising from the larger temperatures and 


amounts of melting that prevailed in the past. A major difficulty is 
that plate velocities and sizes vary by about one order of magnitude 
on modern Earth”, so that extrapolating plate tectonics far back 
in time is uncertain. Other models of secular cooling have relied on 
first-principles convection calculations, but it has proven difficult to 
reproduce plate tectonics owing to the large lithospheric strength 
that must be overcome to initiate subduction. Thus, quantitative 
models for mantle convection in the Archaean must be regarded as 
tentative***5, 

Here, we circumvent this difficulty by deriving a general relation- 
ship between heat flux and melt production rate. We assume that the 
degassing of basaltic melts proceeds to completion so that the degas- 
sing rate is proportional to the melt generation rate, which leads toa 
lower bound on the melting rate. A general equation for the surface 
heat flux can be written for all convection regimes save the ‘stagnant 
lid’ one. In the latter regime, convection develops below a rigid layer 
that caps the whole planet and does not allow surface motions. For all 
other regimes, the basic principle is that cooling is effected in athermal 
boundary layer at the top of the mantle, where vertical velocities are 
negligible. Heat loss is therefore due to conduction and depends on 
the residence time of material at the surface. This principle has been 
thoroughly tested in laboratory experiments and numerical calcula- 
tions, as well as on the current oceanic plates*. Denoting the mantle 
potential temperature by 7,, surface temperature by 7;, the total heat 
loss due to convection is: 


Tp Ts 


[TKTy, 


Qo = W(/)S.k (5) 


where k is thermal conductivity, x is thermal diffusivity and S, is the 
total surface of oceanic plates involved in convective motions. A key 
parameter is T,, which is the maximum age of sea floor at the Earth’s 
surface. W(f) depends on the distribution of sea floor ages, which is 
described by some function: 


f (t/t) = dS/dt (6) 


which is the surface increment between ages ¢ and t+ dt. The current 
age distribution on Earth is ‘triangular’, such that it decreases linearly 
froma maximum at t=0 to zero at t= 1. This contrasts with standard 
convective systems for which the age distribution is ‘rectangular™®, that 
is, constant between t= 0 and t=1,,. The difference between the two 
distributions has a small impact on factor ~(f) in equation (4), which 
is not important for this discussion. The rate of sea floor generation, 
denoted C,, is such that: 


S= Catuf, f(u)du (7) 


The age distribution has again a minor impact on the result 
(amaximum factor of two). The thickness of melt produced, denoted 
H, may be calculated from thermodynamics* as a function of the 
mantle potential temperature 7, for a given mantle composition. The 
volume of melt produced per unit time in a plate tectonic regime is 
equal to: 


p=CH (8) 


This leads to a relationship between the melt production rate and 
heat loss: 


| QH 
O° MGT) a ” 


where Ais aconstant. 
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We may now evaluate the conditions that are needed for a >10-fold 
change in melting rate. As shown in the main text, the Archaean heat 
flux was about equal to today’s value. The mantle temperature was 
about 200 °C higher than today but this only implies an ~20% change 
of the overall temperature contrast (7, - T;), which does not change the 
present argument. According to ref.”, the thickness of melt produced 
in hot Archaean mantle was in the range 25-35 km, corresponding to 
at least a threefold increase with respect to the present-day value. In 
order to achieve a >10-fold increase in melt production rate, the maxi- 
mum age of oceanic plates would need to be decreased by a factor of 
at least 3*= 9. Today, this maximum age is 180 Myr and it is not clear 
how plate tectonics could have operated over less than 20 Myr in the 
Archaean. 

We have focused on the heat flow through oceanic plates and have 
not discussed the potential influence of continents. This is not needed 
here for the following reason. Continental heat flowis very close to the 
amount of heat released by radioactive decay in crustal rocks*, so that 
our conclusion that the Archaean oceanic heat flow had to be about 
equal to mantle heat production still stands. 


Ashort-lived burst of mantle activity 

Enhanced degassing necessarily implies enhanced melting and heat 
loss, and hence enhanced cooling of the mantle, which must lead to 
a dip of mantle temperature if it is maintained for a long time. The 
thermal impact of a pulse of high mantle activity may be difficult to 
detect, however. Mantle temperatures have been determined with 
a precision of about +60 °C (ref. *°) at time steps of a few hundred 
million years and exhibit scatter (about 100-150 °C) for ages older 
than 2.0 Ga (ref. ”). With the current net energy loss of Earth (equal 
to heat loss minus heat production), it takes about one billion years 
for the mantle temperature to drop by 100 °C. A net energy loss 
that is ten times larger would lead to the same temperature drop 
in 100 Myr, at the detection limit of the current temperature data. 
Combining heat balance arguments with constraints from ambient 
mantle temperatures and ”°Xe data should allow tight bounds to 
be set on the intensity and duration of anomalous mantle activity 
(Fig. 3). 


Data availability 
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org/record/3378722#.Xa6cMi3pNVE. 
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The Matlab code for modelling the degassing rate of Xe from the mantle 
is available at https://zenodo.org/record/3381874#.Xa6cey3pNVE. 
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Extended Data Fig. 1| MDF of atmospheric Xe with time relative tothe 
modern atmosphere. Grey and blue data points** define the evolution (red 
curve) of atmospheric Xe mass-dependent fractionation (MDF)*. The left-hand 
yaxis shows the isotopic fractionation of atmospheric Xe (5Xe,;,) in units of %o 
per atomic mass unit (u). The right-hand y axis represents multiples of the Xe 
inventory of the modern atmosphere, ATM,,. Error bars, +20. The purple point 


onthe left-hand side (ATM) is the modern atmospheric composition, the red 
dot onthe right-hand side (U-Xe) is the primordial composition of atmospheric 
xenon’, the grey-shaded area shows the data range from ref.° and references 
therein, and the dotted horizontal line gives the MDF value; the ATM, values 
correspond toa mean age of 3 Ga. 
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Extended Data Fig. 2| Plot of ?°Xe/*°Xe versus ”°Xe/"*°Xe for CO, well gases. and”®Xe (thick line, dotted thin lines define the error envelope, 95% Cl) that can 
Open circle data points are from ref. **, and the pink filled circle shows the be used to extrapolate the primordial ”°Xe/°Xe of the mantle source for an 
isotopic composition of air (error bars, 10). The boxed area at lower left is AVCC-like ?8Xe/°Xe (dashed black line). 


shown magnified in the inset. There is a correlation between the excess ”’Xe 
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Extended Data Fig. 3 | Modelled evolution of the atmospheric *°Ar/*“Ar ratio 
as a function of time following a mantle degassing event between 2.6 Gaand 
2.2 Ga. The values are scaled to ”°Xe,,,, and are shown with different 
contributions of mantle *°Ar: 0% (‘Monotonic’), 5% and 10%. Atmospheric 


40ar/*°Ar ratios are normalized to the present-day value of 298.6, and the 
evolution curves were adjusted in order to yield the modern value. The 
Archaeanatmosphere’s value is from ref. *. The yellow dot marks the end of 
catastrophic degassing and the start of continuous degassing, following ref."*. 
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Extended Data Fig. 4 | Schematic representation of the method used to off-graph in this space. A”°Xe values at each step of the simulation are reported 
calculate the contribution of mantle-derived ”°Xe to the atmospheric onthe left of the corresponding data points. The dashed line corresponds to 
budget of ”°Xe from stepi-1tostepi. The format is the same as in Fig. 1, the MDF line, with the shaded blue area representing the corresponding error 


where they axis corresponds to 6Xe,,, (only indicative here).The ‘mantle’ arrow envelope. 
indicates that the ”°Xe/*°Xe of the mantle end-member is high, and would plot 
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Extended Data Fig. 5 | Time series showing possible scenarios of mantle 
regassing histories. Shown is recycling of atmospheric Xe into the mantle 
(blue lines, left-hand y axis)” compared to the time evolution of atmospheric Xe 
isotopic composition‘ (Xe, ;,, right-hand y axis). The pink arrow shows the 
direction of atmospheric Xe isotopic evolution, from U-Xe (the progenitor of 
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atmospheric Xe) to present. This illustrates the fact that regassing of 
atmospheric Xe into the mantle would have become efficient only after 
atmospheric Xe had reached a modern-like isotopic composition, thatis, 
within the last 1.5 Gyr. Adapted from ref. ”. 
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Extended Data Fig. 6 |Maximum flux of Xe (represented by “°Xe) degassed would result in even greater Xe fluxes from the Archaean mantle (see black 
fromthe mantle asa function of the mantle ”’Xe/°Xe ratio. Computations arrow). Given that the onset of atmospheric Xe recycling into the mantle is not 
reported in Fig. 2 of the main text have been carried out using a fixed mantle well known, possible °°Xe flux values from the Archaean mantle are within the 
29Xe/°Xe of 14. Here we show that lowering this ratio (for example, via range 10-150 mol yr ‘(orange curve), well above the modern flux value 


subduction of atmospheric Xe) down to modern mantle-like ”°Xe/°Xe = 7-8 (0.85+ 0.35 mol yr“, horizontal dashed line; Extended Data Table 2). 


Extended Data Table 1| A’?°Xe values (in %o) versus ages (in Ga) 
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Sample names (left column), locations, ages and original Xe data can be found in refs. *°* and at https://zenodo.org/record/3378722#.Xa6cMi3pNVE. 
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Extended Data Table 2 | Archaean atmospheric inventory 
and modern mantle flux 


Atmospheric '?°Xe inventory 


Value + Refs./notes 
Atm. Xe 1.54x 10 = 1.77x 10"! 7 
Modern atm. !°Xe/!°Xe 6.496 . 7 
29Xe/39Xe with 6.342.5 %o deficit 6.455 0.016 this work 
2°Xeper deficit in the atmosphere 2.56x 10! = 1.02 x 10! 95% Cl 
Average '°Xexs flux over 3.0 Ga 8.5 3.4 mole/yr 
Atm. Xe lost to space 
3.5 times modern Xe inventory 8.96x 10! = 3.57 x 10° 95% Cl 
Average '°Xexs flux over 3.0 Ga 30 12 mole/yr 


Modern !?°Xexs flux from the mantle 


Value + Refs./notes 
3He flux from mantle to oceans $27 102 15 
3He flux from subaerial volcanism 275 35 16 
Global *He flux from the mantle 802 137 mole/yr 
Mantle *He/!°Xe 950 50 10 
39Xe flux 0.85 0.32 mole/yr 
MORB mantle '°Xe/!*°Xe 7.8 10,11 
MORB mantle '?°Xexs flux 1.11 0.42 
Plume mantle '°Xe/!°Xe 7 10,11 
Plume mantle '?°Xexs flux 0.44 0.17 
Modern '°Xexs flux 0.89 0.47 mole/yr 


Top, atmospheric inventory of missing '°Xe in the Archaean atmosphere ('7°Xep,;). Bottom, 
modern mantle "°Xey, flux. Xe isotope fractionation in modern air indicates®* specific loss 

of Xe from the atmosphere to space from 4.5 Ga to about 2.0 Ga. The amount of ”°Xe lost to 
space between 3.0 Ga and about 2.0 Ga is equal to 2.5 times the modern inventory (Extended 
Data Fig. 1). For the modern mantle "°Xe,, flux, we considered a mantle made of 1/3 plume 
source and 2/3 MORB source, which yields a value intermediate between those computed for 
either a plume composition or a MORB composition, respectively. Data are from refs.”0™>"®, 


Extended Data Table 3 | Mantle and atmosphere inventories 


Modern mantle 

*He mantle source 

Xe/He 

Mantle '*°Xe (mass: 4.10*’ g) 
2X e/' "Xe 

Mantle Xe 

% degassed: Closed atm 

% degassed: Atm, escape 


Archean mantle 

*He mantle source 

'°Xe/He 

Mantle '°Xe (mass: 4.10°” g) 
Xe! Xe 

Mantle Xe 

% degassed: Closed atm 

% degassed: Atm escape 


°¥ePHe 
*He/*Ar 
°XePAr 
*Ar/*Ar 
“Ar/*Arat3 Ga 
"Xe/Ar 
9X e/'™Xe 
"*Xexs/"Ar 
2 Xeper 

“Ar degassed 
“Ar atm 

% “Ar atm 


Mantle 


MORB source 


Plume source 


1.50x10°* 1.50x10"" 
1.05x10" 1.05x10° 
6.32x10" 6.32x10"' 
78 7.0 
1.90x10° 1,76x10" 
97.8% 34.2% 
99.4% 64.5% 
1.50x10'* 1.50x10" 
2.11x10* 2.11x10* 
1.26x10° 1.26x10"! 
14 14 
4.05x10'° 5.66x10"* 
59.1% 3.5% 
67.7% 5.4% 
Atmosphere 
MORB source Plume source 
1.05x10° 1.05x10° 
0.5 0.8 
8.16x10* 8.16x10" 
40,000 10,000 
7584 1896 
1.08x107 4.31x10° 
14 14 
8.07x10" 3.23x10° 
9,0x10'° 9.0x10'° 
1.11x10"7 2.79x10'° 
1,65x10"* 1,65x10'* 
6.8% 1.7% 


Contrib. to atmosphere 


Notes/Refs 


35 
10 


35 
10 


This work 


Notes/Refs 


10,11 


Pre-subduction 


Atm. escape 


T 


Concentrations are given in mol g™ and abundances are given in mol. Modern °Xe/*He ratios 
for MORB and plume sources are similar within"° 10%. Modern mantle fluxes and concentra- 


tions are from ref. * 


and noble-gas compositions are from refs. ”"°". Xe contents are computed 


from °Xe percentage for the different Xe isotope compositions. The pre-subduction mantle 
composition uses extrapolated '°Xe/"*°Xe ratios (Extended Data Fig. 2 and Methods) and, for 
Xe abundance, is obtained by removing 80% atmospheric Xe, corresponding to atmospheric 
contamination that took place during the Proterozoic. The *°Ar/*°Ar ratios at 3 Ga are 

corrected for “°K production during the last 3 Ga. 
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Extended Data Table 4| Results of models for flux evolution through time 


129X enec @ max 


0/ 130 
(moles) (mole/yr) ao 


Marty 2012 Halliday 2013 


Scenario 1 
(Lower limit) 
Scenario 2 


22ixi0” 1.5% 19% 


T8Tx10" 51% 45% 
(Upper limit) x : ‘ 


Scenario 3, no escape 
Power law 1.81x10!° 18 (3 Ga) 


Exponential 1 1.56x10!° 25 (3 Ga) 

Exponential 2 2.25510" 63 (3 Ga) 
Ramp (2.6-2.2)Ga —_2.23x10!° 56 (2.6 Ga) 
Ramp (2.5-2.4)Ga _2.23x10!9 224 (2.5 Ga) 


Scenario 3 with escape 


Power law 3.98x10!° 64 (3 Ga) 
Exponential 1 3:77x10" 89 (3 Ga) 
Exponential 2 6.10x10!° 220 (3 Ga) 

Ramp (2.6-2.2)Ga  4.85x10!° ‘141 (2.6 Ga) 
Ramp (2.5-2.4)Ga_  5.02x10!° 519 (2.5 Ga) 


See Methods for definition of scenarios. '°Xep¢g corresponds to integrated amounts (in mol) of °Xe required to have been degassed from the mantle to account for the isotopic evolution of 
atmospheric Xe. ©,,,,, are the maximal fluxes of mantle ”°Xe. '°Xep¢¢ (%) corresponds to the fraction of the mantle “°Xe inventory required to have been degassed, for mantle budget estimates 
by Marty” and Halliday*®, and for a mantle °Xe/"°°Xe of 14. 
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Many ideas have been proposed to explain the origin of bipedalism in hominins and 
suspension in great apes (hominids); however, fossil evidence has been lacking. It has 
been suggested that bipedalism in hominins evolved from an ancestor that was a 
palmigrade quadruped (which would have moved similarly to living monkeys), or 
from amore suspensory quadruped (most similar to extant chimpanzees)’. Here we 
describe the fossil ape Danuvius guggenmosi (from the Allgau region of Bavaria) for 
which complete limb bones are preserved, which provides evidence of a newly 
identified form of positional behaviour—extended limb clambering. The 11.62-million- 
year-old Danuvius is a great ape that is dentally most similar to Dryopithecus and other 
European late Miocene apes. With a broad thorax, long lumbar spine and extended 
hips and knees, as in bipeds, and elongated and fully extended forelimbs, as in all apes 
(hominoids), Danuvius combines the adaptations of bipeds and suspensory apes, and 


provides a model for the common ancestor of great apes and humans. 


Many studies since the nineteenth century have investigated the ori- 
gin of human bipedalism. From Darwin and Huxley to the present, 
many researchers have added insights into this question but with little 
or no fossil evidence in support? *. Although many fossils have been 
discovered, none has shed light directly on this central question in 
palaeoanthropology. 

Since the 1970s, many fossil apes from the middle to late Miocene 
epoch (13-5.3 million years ago (Ma)) from Europe have been discov- 
ered and described, along with smaller samples from the same time 
period in Africa*’. Apes and humans are thought to have diverged at 
this time®. Some of these discoveries include partial skeletons?””, but 
none shows preservation of completely intact long bones. Although 
opinions vary as to the relationship of these hominids to living homi- 
nids, nearly all researchers recognize European late Miocene apes as 
hominids as opposed to the stem hominoids of the early and middle 
Miocene epoch of Africa®™””. 

Postcranially, the most complete fossils from Europe include the 
well-preserved remains of the small bones of the hand, fragments of 
the long bones of the limbs, a partial pelvis and partially preserved 
vertebrae. These discoveries have provided insights into the anatomy of 
late Miocene apes. We know that these apes, including Pierolapithecus, 
Dryopithecus, Hispanopithecus and Rudapithecus, were suspensory 
and similar to modern great apes to varying degrees. However, with- 
out complete long bones of the limbs and well-preserved joint sur- 
faces (especially of the lower limbs), interpretations of details of the 
positional behaviour of these apes remain limited. 

Reconstructing the ancestral form of positional behaviour of great 
apes and humans is best accomplished through the analysis of fossils. 
On the basis of comparisons of Ardipithecus, extant catarrhines and 
Miocene apes, it has been argued that human bipedalism evolved froma 


form of arboreal quadrupedalism in the last common ancestor of great 
apes and humans", Others have argued that bipedalism arose from 
amore suspensory ancestor, based largely on fossil evidence of late 
Miocene hominids®”. These scenarios are based on fragmentary fossil 
evidence. Here we present a different scenario based on our analysis of 
awell-preserved dryopithecin ape from Bavaria. The ulna, femur, tibia, 
vertebrae, hand and foot bones of this ape reveal unknown aspects of 
the anatomy of late Miocene apes and enable us to reconstruct what 
may be the ancestral morphology of the great apes and humans. 


Extended limb clambering 


The fossils (Fig. 1) include remains of at least four individuals, with 
a partial skeleton that is sufficiently complete to describe the mor- 
phology of the limbs and spine and proportions of the body in detail. 
The results reveal a combination of anatomical features that are 
indicative of a pattern of arboreal behaviour that we term extended 
limb clambering (ELC). It is characterized by generalized limb pro- 
portions superimposed on a unique combination of knee, ankle, 
elbow and wrist postures and strongly grasping extremities. ELC 
incorporates powerful hallucal grasping, plantigrade feet, extended 
hip and knees, wide ranging elbow flexion-extension and prona- 
tion-supination, a mobile wrist, and hands with curved phalanges 
and a deep first metacarpal joint. It differs from previously identi- 
fied forms of positional behaviour. Plantigrade and palmigrade 
quadrupeds (Old World monkeys and Ekembo) lack the suspensory 
attributes of the forelimb and the extension set of the knee. Knuckle- 
walkers (chimpanzees, bonobos and gorillas) lack the extended knee 
and have less powerfully developed hallucal and pollical grasping. 
The hand phalanges of Danuvius also lack the robusticity typical of 
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knuckle-walkers. Arboreal clambering orangutans lack the weight- 
bearing adaptations present in the knee and ankle of Danuvius and 
have features that much more strongly emphasize forelimb pos- 
tural and locomotor adaptations. Danuvius is distinguished from all 
known catarrhines in its vertebral morphology, with an elongated 
lumbar region combined with spinal invagination/lordosis, which 
shifts the body mass over the expanded proximal tibial joint surfaces. 
The uniqueness of ELC is that it does not favour the forelimb or the 
hindlimb, as in most primates, but utilizes both limbs in roughly 
equal proportions. ELC includes a combination of joint positions 
and loading patterns of both hominin bipedalism that emphasize 
hindlimb extension and spinal curvatures, and extant great ape 
suspension, which emphasizes powerful and mobile forelimbs. We 
propose ELC as anew model of the ancestral mode of positional 
behaviour of the last common ancestor of living great apes and 
humans. ELC is a precursor to obligate bipedalism, which shifts the 
emphasis of positional behaviour to the hindlimbs, and to suspen- 
sion, in which the emphasis shifts to the forelimbs. 


Systematic palaeontology 


Order Primates Linnaeus, 1758 
Infraorder Catarrhini Geoffroy, 1812 
Family Hominidae Gray, 1825 
Danuvius guggenmosi gen. et sp. nov. 
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Fig. 1| Fossil remains of four D. guggenmosi 
individuals from late Miocene sediments of 
Hammerschmiede. a, Holotype GPIT/MA/10000 
male individual. b—d, Paratype individuals GPIT/ 
MA/10003 (female), GPIT/MA/10001 (female) and 
GPIT/MA/10002 (juvenile). An excavation plananda 
complete list of all elements can be found in Extended 
Data Fig. 1and Supplementary Table 2. The scale bar is 
20mm forall bones and 10 mm for all isolated teeth. 
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Etymology. The genus name is derived from Celtic-Roman river god 
Danuvius. The trivial name honours the discoverer of the Hammer- 
schmiede locality, Sigulf Guggenmos. 

Holotype. Partial skeleton of male individual GPIT/MA/10000, com- 
prising 21 elements (Fig. 1a): partial left mandible with M, and M,, partial 
left maxilla with P?-M?, isolated mandibular (left I, P;; right P;, M,, M;) 
and maxillary teeth (right P°), first and transitional thoracic vertebrae, 
left humeral shaft fragment, right ulna, left metacarpal | fragment, right 
proximal manual phalanges II and IV, two left intermediate manual 
phalanx fragments, right femoral head, right patella, left tibia, left 
proximal pedal phalanx I. 

Paratypes. Two smaller adults (GPIT/MA/10001 (Fig. 1c), comprising 
left P?, M?, left femur head; and GPIT/MA/10003 (Fig. 1b), comprising 
left I, 1,, fragments of M,, M’, M?, left femur, proximal hallucal phalanx 
fragment) and one juvenile individual (GPIT/MA/10002 (Fig. 1d), com- 
prising unerupted left P;, left I’, left and right DP,, right DP*, epiphysis 
of the intermediate manual phalanx). 

Locality and horizon. Hammerschmiede Clay pit near Pforzen (Allgau 
region, Bavaria, Germany, Extended Data Fig. 1;47.923° N, 10.588° E); level 
Hammerschmiede (HAM) Sat stratigraphic metre 12 in the local section, 
whichhas been dated magnetostratigraphically to11.62 million years ago”. 
Diagnosis. Small hominid ranging in size from about 17 to 31 kg. The 
palate is narrow and deep with a thick palatine process; the maxilla is 
high, anteroposteriorly broad, with an anteriorly facing zygomatic root 
above the distal moiety of P*, maxillary sinus invaginating the zygo- 
matic and alveolar processes, canine fossa deep and narrow, canine 


Fig. 2|D. guggenmosiholotype.a, Palate (left; right side mirror-imaged) and 
left maxilla from superior (middle) and lateral (right) views, with a three- 
dimensional rendering of dental roots and maxillary sinus (blue). The sinus is 
invaginated by the posterobuccal and lingual roots of M?and is superior to the 
roots more anteriorly (dashed black line). Laterally the sinus extends into the 
zygomatic root (dashed white line); additional images are shown in Extended 
Data Fig. 10. b, Left proximal hallucal phalanx in lateral (left), plantar (middle) 
and medial (right) views. c, Right proximal hand phalanx 2 in palmar (left), ulnar 
(middle) and proximal (right) views. d, Right proximal hand phalanx 4 in plantar 
(left) and ulnar (right) views. e, Tibial proximal (top) and distal (middle) 
articulations (anterior is up) and sagittal computed tomography cross-section 
through the middle of the lateral condyle (bottom; superior is up). f, First 
thoracic vertebra in superior (left) and left-lateral (right) views. 

g, Diaphragmatic vertebra in posterior (left), superior (middle) and right- 
lateral (right) views. Scale bars, 10 mm. 


rootalveolus vertically oriented; I' mesiodistally narrow, high-crowned 
witha strong lingual pillar and mesial marginal ridge; postcanine den- 
tition with strongly developed crista, P* lacks the paraconule, molars 
are broad relative to the length with compressed trigons and thick 
enamel; mandibular corpus is low, robust witha prominent mandibular 
eminence and a broad extramolar sulcus; ulna has a straight shaft, 
moderately deep proximally, short olecranon, deep, strongly keeled, 
anteriorly oriented trochlear notch, large, laterally oriented radial 
notch, large head, short, non-articular styloid process; first metacarpal 
base strongly dorsopalmarly curved saddle-shaped joint; proximal 
hand phalanges are long, curved, with strongly developed flexor sheath 
ridges; femur head projects above the greater trochanter, extension 
of joint surface onto the superoposterior surface of femoral neck, 
neck compressed and strongly vertically oriented; tibia with broad 
proximal end, thickened metaphyses, mediolaterally concave condylar 
surfaces, lateral condyle anteroposteriorly flat, deeply incised and 
posteriorly oriented intercondylar notch, prominent intercondylar 
eminences, trochlear surface roughly square-shaped, strongly keeled, 
prominent malleolus deeply notched at its base with an anterolaterally 
expanded joint surface; patella with broad, flat joint surface; proxi- 
mal hallucal phalanx is large, robust at mid shaft, broad proximally, 
prominent flexor sheath ridges, strong lateral torsion of the distal 
end; first thoracic vertebra with short, divergent pedicles, strongly 
divergent zygapophyseal orientations, univertebral rib articulation; 
penultimate or antepenultimate diaphragmatic vertebra with a promi- 
nent metapophysis. 

Differential diagnosis. The craniodental morphology of Danuvius 
is diagnostically dryopithecin (‘Expanded differential diagnosis of D. 
guggenmosi inthe Methods). The anterior palate (Fig. 2a) is short in 
comparison with pongines, with a stepped subnasal fossa, as is typical 


Fig.3|D. guggenmosi, right ulna (GPIT MA/10000-10) and left tibia (GPIT 
MA/10000-15). a—c, Anterolateral (a) and medial (b) views of the ulna and the 
reconstructed proximal end in lateral view (c). d-f, Posterior (d) and anterior 
(e) views of the tibia and the distal epiphysis in anterior view (f). Tibial shaft 
cross-sections are given at 20%, 35% and 50% of shaft length from the distal 
end. Additional images of the ulna and tibia are shown in Extended Data Fig. 4. 
Scale bars, 20 mm (a-e) and10 mm (f). 


of dryopithecins and extant hominines. Danuviusis distinguished from 
other dryopithecins in having a unique combination of facial attributes 
(compressed canine fossa, vertical canine implantation, anteriorly fac- 
ing malar surface, robust mandible, prominent mandibular eminence, 
wide extramolar sulcus; Extended Data Fig. 2). The proximal ulna dif- 
fers from Hispanopithecus and Rudapithecus in its anteriorly facing 
trochlear notch and expanded coronoid process (Fig. 3). The distal 
tibia differs from Hispanopithecus in its more squared outline and in 
details of articular morphology (see Supplementary Information for 
detailed descriptions and comparisons and Supplementary Tables 3-24 
for measurements). 


Limb proportions and posture 

The postcrania of Danuvius reveals numerous previously unknown 
aspects of dryopithecin morphology. Compared with the length of the 
tibia, Danuvius has a relatively elongated ulna (Fig. 4a and Extended 
Data Fig. 3), comparable to Pan paniscus. In Pongo, the ulna is longer 
whereas in cercopithecoids and early hominins it is shorter. On the 
basis of reconstructed lengths, Oreopithecus and Hispanopithecus have 
tibia:ulna ratios that are comparable to that of Danuvius. 

A mediolaterally broad thorax and orthogrady is inferred from the 
dorsal orientation of the thoracic transverse processes, combined witha 
low costal facet angle on the first thoracic vertebra” (Fig. 2f, g). Inferred 
from the difference in inclination of the spinous processes between 
the first vertebra and the lower thoracic vertebra, the upper spinal 
column was substantially curved (cervical lordosis/thoracic kyphosis)”. 
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Fig. 4| Body proportions and distal tibia articulation metrics. a, Ratio of ulna- 
to-tibia physiologic length (natural logarithm) in relation to body mass (natural 
logarithm of femur head diameter) of extant catarrhines (n=178; for raw data 
see Supplementary Table 7) compared to fossil hominoids (D. guggenmosi, 
GPIT/MA10000; Hispanopithecus laietanus, IPS 18000; Oreopithecus bamboli, 
IGF 11778; Ardipithecus ramidus, ARA-VP-6/500; Australopithecus prometheus, 
StW 573; Australopithecus afarensis, A.L.288-1; Ekembo heseloni, KNM-RU 2036; 
data are from previous studies*"**” *°). b, Plot of relative thickness of tibial 


D. guggenmosi is, to our knowledge, the first Miocene hominid with 
evidence of diaphragmatic vertebra placement, which is important 
in interpreting thoracolumbar spine evolution in hominoids®. The 
well-developed costotransversal facet of GPIT/MA/10000-16 (Fig. 2g) 
indicates a non-ultimate thoracic position for the diaphragmatic 
vertebra and therefore a functionally longer lower back, as in early 
hominins, stem-hominoids and cercopithecids®™. On the basis of 
indirect evidence from the pelvis, a longer lower back has also been 
inferred for Rudapithecus”. Extant hominoids including Homo show 
a diaphragmatic placement at the ultimate thoracic vertebra level’. 
The contrasting vertebral configuration of Danuvius suggests that 
diaphragmatic cranial displacementis the symplesiomorphic hominoid 
condition, supporting the long-back model**””. The increased number 
of functional lumbar vertebrae allows sagittal flexibility to lordose the 
lumbar column, which contributes to effectively position the centre 
of mass over extended hips, knees and plantigrade feet (see below), 
implying at least some degree of habitual bipedal posture”. 


Positional behaviour 


Several skeletal elements of the upper limb bear unmistakable hallmarks of 
below-branch or suspensory positional behaviour (Fig. 3a—cand Extended 
Data Fig. 4). Despite the pathology evident on the ulna (Supplementary 
Information), these include a reduced olecranon process, broad, keeled 
trochlear notch with prominent medial and lateral surfaces for a troch- 
leaform humeral trochlea, large laterally oriented radial facet, robust 
proximal ulnar shaft and areduced, non-articular ulnar styloid process. The 
proximal hand phalanges are curved with prominent flexor sheath ridges 
(Fig. 2c, dand Extended Data Figs. 5, 6), indicating that suspension played 
animportant—but not dominant—part in its locomotory repertoire (for 
example, more similar to Pan than to Pongo). Powerful pollical grasping 
and increased thumb mobility are indicated by the strong dorsopalmar 
and radioulnar curvatures of the base of the first metacarpal (Fig. 1a). 
The lower limb suggests postural extension at the hip and knee joints 
and a uniform force distribution ina stabilized ankle joint, combined 
with a powerful grasping hallux. On the femur (Fig. 1b and Extended 
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medial malleolus and size-standardized anterior width of tibial distal 
articulation surface (measurements followa previous study”) of extant 
catarrhines, compared to fossil hominoids (D. guggenmosi, GPIT/MA/10000; H. 
laietanus, IPS 18000; Sivapithecus indicus, YGSP 1656; Nacholapithecus kerioi, 
KNM-BG 35250; E. heseloni, KNM-RU 2036, 3589; Proconsul major, NAP 1I’58; 
comparative data were obtained from previous studies***“"; for raw data see 
Supplementary Tables 19, 20). C, Pan; G, Gorilla; P, Pongo. 


Data Fig. 7b-d), the low greater trochanter, the more vertically oriented 
neck and the posterosuperior expanded joint surface suggest that the 
femoral head articulated in habitual extension with an os coxae that 
was laterally rotated, which would have caused the iliac blade to be 
more tilted inferolaterally. This may have enhanced the function of 
the gluteal muscles as hip stabilizers (abductors) in bipedal posture, 
as in hominins. The flat patella (Fig. 1a and Extended Data Fig. 7a) and 
shallow rounded patellar surface suggest slow and deliberate move- 
ments (Supplementary Information). The absence of an anteroposte- 
rior convexity to the lateral tibial condyle (Fig. 2e and Extended Data 
Fig. 8), a character shared with hominins and hylobatids”’, suggests 
an extension set to the knee joint, as a flatter contour maximizes tibio- 
femoral contact area and joint stability during extended knee pos- 
tures. A buttressing of the tibial metaphysis also reflects stereotypical 
extended knee postures under compressive load*®”’. The exceptional 
development of the intercondylar eminence is probably related to 
the presence of strongly developed cruciate ligaments. The subequal 
size of the tibial condyles indicate a more equally distributed weight 
transmission on the knee joint®°. Together, the morphology of the 
tibial plateau suggests an adaptation emphasizing an extended knee 
reinforced by strongly developed intra-articular ligaments. We inter- 
pret the distal tibia of Danuvius, with its mediolaterally short anterior 
trochlear margin and its mediolaterally narrow malleolus (Fig. 4b), to 
be an adaptation to a more uniform distribution of forces across the 
joint surface, with limited ankle loading in dorsiflexion and inversion 
compared to extant apes”. The combination of the anteroposteriorly 
deep malleolus, medially expanded joint surface, prominent anterior 
margin with a strongly developed beak and strongly inclined medial 
and lateral trochlear surfaces produces a hinge-like morphology to 
the anterior talocrural joint, which would have been most stable with 
the foot roughly perpendicular to the long axis of the tibia. This is cor- 
roborated by the nearly perpendicularly orientated tibia relative to 
the horizontal plane of the angle joint (Fig. 3d, e and Supplementary 
Information). Extant great apes, which load the ankle in inversion 
during climbing, have an obliquely oriented tibia relative to the plane 
of the ankle joint®!*°. The near perpendicular tibial angle is a shared 


character between hominins and Danuvius and supports the inference 
of a habitual valgus knee position and bipedalism for the new genus. 
Arobust, elongated and strongly laterally torsioned hallux (Extended 
Data Figs. 5b, c, 9) with well-developed muscular attachments suggests an 
emphasis on powerful hallucal grasping with adducted ankle stabilizedin 
aneutral position relative to the long axis of the tibia. In contrast to extant 
apes, the hallux was capable of interphalangeal hyperflexion, as indicated 
by the substantial plantar inter-condylar recess and depression (Fig. 2b), 
enabling Danuvius to securely grasp small-diameter arboreal supports. 


Discussion 


The uniqueness of D. guggenmosi is demonstrated by its small body 
size (between siamangs and bonobos; Supplementary Information 
and Supplementary Table 23) with limb proportions most similar to 
bonobos (Fig. 4a), a cranial shifted diaphragmatic vertebra (Fig. 2g), 
a strong grasping hallux (Fig. 2b) and a morphology of the tibia that is 
surprisingly similar to hominins (large-sized and flat lateral condyle 
with ‘buttressed’ plateau, tibial shaft perpendicular to talar facet, 
mediolaterally narrow malleolus and short anterior trochlear margin) 
(Fig. 3d-f, Extended Data Fig. 4and Supplementary Information). The 
combination of morphological attributes of the limbs and vertebra of 
Danuvius point to a newly recognized form of positional behaviour. 
In contrast to suspensory behaviour, clambering and arm-assisted 
bipedalism in Pongo™ or climbing and suspension in African apes, ELC 
involves equal contributions of the fore- and hindlimbs. The foot is flat 
and adducted on horizontal to mildly inclined branches with a hallux 
capable of powerful grasping, stabilizing the hindlimb. Torques result- 
ing from body rotation above the knee are countered by powerfully 
developed cruciate ligaments. The knee is habitually extended and 
supported by athickened plateau and large, flat-to-concave, proximally 
facing condyles. The elbow is capable of a full range of flexion-exten- 
sion and pronation-supination as in extant hominoids. The hand was 
strong enough to generate the force to counter torques in a variety 
of positions ranging from suspensory to palmigrade, but without the 
hyperextension at the metacarpophalangeal joints that characterize 
Old World monkeys and Pierolapithecus. This newly defined locomotor 
category includes attributes of orthograde suspension and hominin 
bipedalism, making it a potential candidate for the positional behav- 
iour of the last common ancestor of great apes and humans. Danuvius 
provides fossil evidence that hominin bipedalism and great ape sus- 
pension evolved froma form of arboreal locomotion that incorporates 
attributes of each**”*, which has roots in the middle Miocene of Europe. 
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Methods 


Geology, age, fossils and taphonomy 

The HAM Schannel represents a riffle pool sequence of a small and shal- 
low meandering rivulet with a talweg width of 4-5 m and a maximum 
pool depth of 1 m. The gravelly bed load is composed exclusively of 
reworked pedogenic carbonate concretions that are typically 4-8 mmin 
diameter. Similar concretions are abundant in Bk palaeosol horizons of 
the bedrock, indicating a local source of HAM S rivulet. Magnetostratig- 
raphy of the local 26-m thick section, combined with a nearby 150 m 
deep drill core, revealed the date of the channel fill of 11.620 million 
years ago (+5 thousand years), directly at the base of the Tortonian, 
late Miocene”. Excavation of about 200 m* between 2011 and 2018 
revealed a high vertebrate diversity that comprised 100 species of 
fishes, amphibians, reptiles, birds and mammals (see Supplementary 
Table 1 for faunal list). Hominids are acommon element in this thana- 
tocoenosis, representing about 10% of all excavated large mammal 
individuals. Excavation demonstrates that fossil vertebrates are found 
exclusively along the channel, suggesting some sort of accumulation. 
Most finds are disarticulated skeletal elements, which tend to be com- 
plete in small- and medium-sized mammals (for example, carnivores, 
artiodactyls and primates) and broken and sometimes abraded in large- 
size taxa (for example, perissodactyls and proboscideans). Skeletal 
articulation occurs in rare cases. However, many medium-sized indi- 
viduals are documented by associated specimens found within a few 
square metres, suggesting minor transport and sorting of bones. The 
21 bones and teeth from the most complete hominid individual GPIT/ 
MA/10000 represent about 15% of the skeleton. It is found within the 
talweg at amaximum distance of 20 m, except the first thoracic verte- 
bra, which was found a further 10 m downstream. Moderate sorting of 
GPIT/MA/10000 is documented by proximal concentration of isolated 
teeth, followed by skull elements and more distally long bones and 
phalanges, whereas vertebra are transported furthest down the chan- 
nel (Extended Data Fig. 1). This arrangement follows experimentally 
observed patterns of bone taphocoenosis in rivers”. 


Fossil repository 

All Hammerschmiede fossils are stored in the palaeontological col- 
lection of the University of Tubingen (acronym GPIT), a research 
infrastructure of the Senckenberg Institute for Human Evolution and 
Palaeoenvironment (SHEP) Tiibingen. 


Bone preservation 

The Hammerschmiede locality is an active clay-mining pit. Sedi- 
ments from the fossiliferous rivulet channel HAM 5 are composed 
of fine-pebbly pedogenic carbonate nodules and marls with various 
degrees of silt and rare fine-sand admixture. Owing to mining activi- 
ties, water-saturated clay-rich sediments on steep section walls tend 
to creep and heavy machinery add compressive load on the sediment 
surface. Therefore, postcranial long bones of smaller large mammals 
(for example, deer, tragulids, carnivores and primates) tend to be 
compressed at the shaft and occasionally laterally distorted. This 
strongly affected the complete femur of GPIT/MA/10003 (shaft com- 
pressed by machinery loading, folded along the shaft due to ground 
creeping), which was embedded in soft clay. The complete uIna of 
GPIT/MA/10000 is uncompressed, but at midshaft the cortical bone 
of the down-lying side is crushed and pushed into the shaft, probably 
by load compression. Computer tomographic images show that this 
preservation was facilitated by midshaft osteoporosis. By contrast, the 
complete tibia of GPIT/MA/10000, embedded ina less compressible 
silt-dominated matrix, is not crushed along the shaft, but laterally 
distorted at the tuberosity and slightly damaged at medial condyle and 
distal metaphysis, which are the result of excavation artefacts. Impor- 
tantly, all cranial and small postcranial ape specimens (phalanges, 
metapodial, carpal bone and patella), as well as long-bone joint 


articulations remained undisturbed, but occasionally show small 
excavation artefacts. 


Length reconstruction 

To measure the total and physiologic length of distorted long bones, 
we use three-dimensional prints of virtual reconstructions for the 
holotype tibia and ulna (GPIT/MA/10000-10 and -15, respectively). 
The total length of the crushed paratype femur (GPIT/MA/10003-01) 
is estimated with an accuracy of about +5 mm. 


Expanded differential diagnosis of D. guggenmosi 

The molars lack cingula and are elongated relative to length, with 
peripheralized cusps. These attributes and P, cusp morphology, P, 
length and M,-M, proportions distinguish Danuvius from Ekembo 
and other early Miocene hominoids. The dentition is readily distin- 
guished from thickly enamelled middle and late Miocene apes such 
as Kenyapithecus, Nacholapithecus, Griphopithecus, Sivapithecus and 
Ouranopithecus. 

ThemaxillaofD.guggenmosi(Figs.1a,2aandExtended Data Figs. 2a,3a) 
differs from Anoiapithecus, Pierolapithecus and Dryopithecus in its 
anteroposteriorly broad zygomatic root (zygomatico-alveolar crest) 
and convex and postero-inferiorly inclined temporal surface; deeply 
invaginated maxillary sinus floor; vertically implanted upper male 
canine (supero-inferiorly and mediolaterally); deep, anteropostreri- 
orly narrow canine fossa and anteriorly facing zygoma. Differs from 
Hispanopithecus and Rudapithecus maxilla in its deep, anteropostre- 
riorly narrow canine fossa and anteriorly facing zygoma, anteriorly 
positioned zygomaticoalveolar crest and deeper palate. Maxillary 
dentition differs from Anoiapithecus, Pierolapithecus and Dryopithecus 
by broader premolars; triangular P?; low mesial and distal P? buccal 
shoulders; more mesiodistally centralized premolar cusps (shorter 
talon); broad, concave premolar trigon and talon basins; more strongly 
developed molar crista; more peripheralized cusps; mesiodistally 
compressed trigon. I' differs from Pierolapithecus and cf. Dryopithecus 
sp. (La Grive) in its more strongly developed mesial marginal ridge 
and convex lingual surface. The maxillary dentition differs from 
Hispanopithecus and Rudapithecus in its low P? crown shoulders and 
broad upper premolars. The mandible (Fig. la and Extended Data 
Fig. 2b) differs from Anoiapithecus and Dryopithecus in its shallower, 
robust corpus (unknown in Pierolapithecus), prominent mandibular 
eminence and wide extramolar sulcus. Mandibular dentition differs 
from Anoiapithecus and Dryopithecus in its lower crowned, mesially 
more vertical P, with a prominent mesial beak; broader molar trigonid 
and talonid basins; shorter mesial fovea; absence of buccal cingula; 
elongated molars; short M, roots (not visible in Anoiapithecus). The 
mandible differs from Hispanopithecus and Rudapithecus mandibles in 
the same way as from Anoiapithecus, Pierolapithecus and Dryopithecus 
and from the lower teeth of Hispanopithecus and Rudapithecus in 
having restricted mesial and distal fovea. The mandibular dentition 
differs from Ouranopithecusas it is smaller with more thinly enamelled 
teeth and it differs in other attributes as in Rudapithecus and Hispano- 
pithecus. It also differs from Oreopithecus in having lower postcanine 
cusps, less strongly developed crista/cristids, no centroconid, higher 
P, talonid, higher crowned I’, no upper postcanine lingual cingula. 
The maxilla differs from early and middle Miocene hominoids in the 
high position of the zygomatic root. The dentition differs from early 
and middle Miocene hominoids in the absence of molar cingula, first 
and second molars of similar size, peripheralized molar cusps, more 
vertical mesiobuccal P? surface and short P* shoulders, and higher 
P, talonid. 

The partial skeleton GPIT/MA/10000 includes dental and postcra- 
nial remains that are much larger than the other Hammerschmiede 
individuals. This along with the strongly flared mesiobuccal face of 
the P, (Fig. la and Extended Data Fig. 2g, j) and the large, elongated 
canine alveolus (Fig. 2a) strongly imply that GPIT/MA 10000 isa male. 


Body mass calculations 

For the calculation of the body mass of the individuals, we used metric 
traits (individual measurements) from hind limbs (femur and tibia) 
because they are most involved in weight carrying during locomo- 
tion in great apes”. Our univariate body-mass predictions are based 
on regression equations from a previously published study” for sex/ 
species means of hominoids. In addition, as we can show that body 
proportions of the male individual GPIT/MA/10000 fall within the range 
of bonobos and chimpanzees, we assume a comparable scaling pattern 
and apply regression equations established previously” for femur head 
size of the genus Pan. Both methods produce very similar results for 
the male individual within the 50% confidence interval (Supplementary 
Table 23). Femur size of the two female specimens GPIT/MA/10001 and 
GPIT/MA/10003 are significantly lower than of any extant great ape, 
and hence outside any hominid comparative sample. We therefore use 
the previously compiled regression equations” for the total primate 
sample (hominoids plus cercopithecoids) for the predictor femur head 
size and cercopithecoid equations for predictions based on femoral 
condyle breadth (as recommended in the previously published study”). 


Calculations of enamel thickness 

We used the right M, of the holotype (GPIT/MA/10000-03) to calculate 
enamel thickness given its low occlusal wear (slightly higher on mesial 
half, wear stage 1-2 according toa previously published study”). This 
tooth was scanned with a FF35 CT at the YXLON Application centre in 
Heilbronn (Germany) and captured at 170 kV and 55 JA (S00-ms expo- 
sure time), obtaining a voxel size of 15.8 um (Extended Data Fig. 3b). 
Following a previously published study*, virtual buccolingual sections 
of the molar were performed using Avizo 9.0. Mesial and distal virtual 
sections were defined by the tips of the metaconid-protoconid and 
entoconid-hypoconid perpendicular to the cervical plane. The fol- 
lowing variables were measured two-dimensionally in both planes: 
dentine area (b), enamel cap area (c), length of the enamel-dentine 
junction (e) and the bi-cervical diameter. The average enamel thick- 
ness was calculated as c/e and the relative enamel thickness (RET) was 


calculated as previously described* using RET = 100 x {) .For GPIT/ 


MA/10000-03, the RET =19.36, based on data from the least worn distal 
section (Supplementary Table 6). 


Ellipse estimates of lateral tibial condyle curvature 

To estimate the shape of the lateral tibial condyle, we performed a cut 
through the sagittal mid-line of the condyle on the three-dimensional 
scans of tibiae from D. guggenmosi (Fig. 2e) and extant catarrhines 
(Extended Data Fig. 6) using an Artec Space Spider with Artec Studio 11 
(three-dimensional scans) and Avizo 9 (cross-sections). Subsequently, 
the cross-sections were digitalized and a best-fit ellipse was obtained 
using a non-iterative MATLAB function (‘EllipseDirectFit’; from 


N. Chernov (code available from https://www.mathworks.com/)). To 
compare the individual ellipses, we further calculated the eccentric- 
ity e=V(1- (b/a)*) in which aand bare the semi-major and semi-minor 
axes with a= b. The closer e is to 1, the more elongated the ellipse is, 
whereas e=0 represents a circle. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All data generated or analysed during this study are included in this 
published Article (and its Supplementary Information). The computed 
tomography scans are available from the corresponding author on 
reasonable request. The new taxon has the following Life Science 
Identifier: http://zoobank.org/References/E1573024-9543-4B1E-A79B- 
6E40896A4617. 
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Extended Data Fig. 1| Localization of Hammerschmiede locality and 
excavation plan with localized D. guggenmosispecimens. a, Topographical 
map of Europe. b, Magnification of the western part of the south German 
Molasse Basin (North Alpine Foreland Basin). The Hammerschmiede locality 
(47° 55’37”N, 10° 35.5’ E) is highlighted with a black star. Both maps were 
created using Generic Mapping Tools” and topographic datasets ETOPO1** 
and SRTM3*. c, Excavation plan of the HAMS layer (the section has previously 
been published’>) with excavated areas coloured in grey. Intermediate regions 
represent material lost due to clay mining. Dashed lines indicate the 
reconstructed thalweg course of the palaeochannel. Different colours and 
symbols indicate the individual context: holotype (GPIT/MA/10000) adult 


10000-14 
10000-10 
10000-15 
10000-13 


male marked in red (stars), paratype (GPIT/MA/10001) female lin blue 
(diamonds), paratype (GPIT/MA/10002) juvenile individual in yellow (circles) 
and paratype (GPIT/MA/10003) female 2 in green (triangles). The red encircled 
sector indicates removed and stored sediments that were screen washed 
separately. This area was under threat of destruction from quarry activity. To 
avoid the complete loss of this sediment, approximately 25 tonnes were 
removed for remote processing. Two specimens were recovered in situ in this 
area. Five other specimens from this area were recovered during subsequent 
screen washing and cannot be more precisely localized. Coordinates 
correspond to Gauss-Kriiger Zone 4 grid with easting (R) and northing (H) in 
metres. 
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Extended Data Fig. 2 | D. guggenmosi, dental and cranial specimens. a, Left 
maxilla with P?>-M? (GPIT MA/10000-01) in lateral, anterior, medial (top), 
palatal, posterior, superior (bottom) views. b, Left mandible (GPIT MA/10000- 
02) in lateral, anterior, medial and occlusal views. c, Left upper central incisor 
(GPIT MA/10002-01) in labial, lingual and occlusal views. d, Right upper P® 
fragment (GPIT MA/10000-05) in buccal, occlusal and mesial views. e, Left P? 
(GPIT MA/10001-03) in buccal, occlusal and mesial views. f, Right upper M! 
(GPIT MA/10001-01) in occlusal, medial, distal and buccal views. g, Left lower P; 


vs, 
¢¢° 


(GPIT MA/10000-07) in medial, buccal, lingual and occlusal views. h, Left lower 
lateral incisor (GPIT MA/10003-5) in distal, mesial, lingual and labial views. i, 
Left lower central incisor (GPIT MA/10000-08) in distal, mesial and lingual 
views.j, Right lower P; (GPIT MA/10000-06) in mesial, distal, buccal and 
occlusal views. k, Right lower M, (GPIT MA/10000-03) in lingual, buccal (top), 
mesial, distal (bottom) and occlusal views. I, Right lower M, (GIPT MA/10000- 
04) in lingual, mesial (top), buccal, distal (bottom) and occlusal views. Scale 
bar, 10 mm. 
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Extended Data Fig. 4 | D. guggenmosi, additional views of right ulna (GPIT olecranonin anterior view (d).e, f, Medial (e) and lateral (f) views of the tibia. 
MA/10000-10) and left tibia (GPIT MA/10000-15). a—d, Lateral Scale bar, 20 mm. 
(a), anteromedial (b) and posterior (c) views of the ulna and the reconstructed 
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Extended Data Fig. 5 | Ulnar trochlear notch, phalangeal, metacarpal and 
tibial midshaft comparisons. a, Ulnar trochlear notch angle (for raw data, see 
Supplementary Table 9). b, Hallucal proximal phalanx (PP1) torsion (for 
measurement, see Methods; for raw data, see Supplementary Table 23). c, Size- 
adjusted hallucal proximal phalanx (PP1) midshaft robusticity (MLms x DPms/ 
GM in which MLms is the mediolateral width at midshaft, DPms is the 
dorsopalmar height at midshaft and GM is the geometric mean of the seven 
measurements: ML and DP at proximal, distal and midshaft, and total length; 
for raw data, see Supplementary Table 22). d, Size-adjusted second manual 
proximal phalanx (PP2) gracility (TL/GM in which TL is the total length and GM 
is the geometric mean of five measurements: ML and DP at distal and midshaft, 
and TL; five measurements are used to include Pierolapithecus catalaunicus, in 
which the proximal articulation is damaged*°; for raw data, see Supplementary 


Table 11). e, Manual phalangeal base, ratio of mediolateral (ML) to dorsopalmar 
(DP) length (for raw data, see Supplementary Tables 11, 12). f, Manual 
metacarpal 1 base, ratio of dorsopalmar to radioulnar (RU) length (for raw data, 
see Supplementary Table 10). g, Relative size of manual metacarpal 1 base 
(geometric mean of dorsopalmar and radioulnar lengths) to proximal phalanx 
of ray 2 (geometric mean of seven measurements; for raw data, see 
Supplementary Tables 10, 11). h, Tibial cross-section at midshaft (ratio of 
anteroposterior and mediolateral width; for raw datasee Supplementary 
Table 21). Sample sizes (n) of biologically independent animals are reportedin 
parentheses below each box plot. All box plots show the centre line (median), 
box limits (upper and lower quartiles), crosses (arithmetic mean), whiskers 
(range) and individual values (circles). 
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Extended Data Fig. 6 | Curvature manual proximal phalanges. Box plots of excluding outliers. The line across the box indicates the median sample value 
the first polynomial coefficient (A) of the second-order polynomial functional for coefficient A. Extant primates are colour-coded according to locomotor 
representing phalangeal shaft curvature. The box represents the interquartile adaptation. Taxa are arranged according to ascending median phalangeal shaft 
range, which represents 50% of the sample values. The whiskers are lines that curvature. Sample sizes (n) of biologically independent animals are reported in 
extend from the interquartile range box to the highest and lowest values, 


parentheses after the species names. 
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Extended Data Fig. 7 | D. guggenmosi, patella and femora. a, Right patella anterior (top), superior and lateral (bottom) views. d, Left femur, proximal half 
(GPIT MA/10000-12) in external and internal views. b, Right femur head (GPIT (GPIT MA/10003-01) in anterior (top) and posterior (bottom) views. Scale bar, 
MA/10000-11) in medial, anterior, posterior (top), superior and lateral 10mm. 


(bottom) views. c, Left femur head (GPIT MA/10001-02) in medial, posterior, 
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Extended Data Fig. 8 | Ellipse estimates of lateral tibial condyle. Best fit 20 mm). Inset shows calculated values of eccentricity for the obtained ellipses. 
ellipses to digitalized portions of sagittal cross-sections through lateral tibial Results indicate that both Danuvius and extant humans have a flat lateral tibial 
condyle of D. guggenmosiand extant catarrhines. Digitalized dots areshownin condyle (eccentricity >0.85), whereas great apes exhibit a convex lateral 
colour and best-fit ellipses in black. Orientation of ellipses follows the lateral condyle (eccentricity <O.80) and Cercopithecus occupy an intermediate 


condyle orientation (dorsal is up, anterior is left) at the same scale (scale bar, position. 
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Extended Data Fig. 9 | Hallux length and robusticity. a, Ratio (natural mean), whiskers (range) and individual values (circles). c, Size-adjusted hallucal 
logarithm) of proximal hallucal phalanx total length to tibial physiologic phalanx midshaft robusticity (for explanation, see Extended Data Fig. 8c), 
length, relative to body mass (maximum femur head diameter). b, Box plots of relative to femur head diameter. All sample sizes (n) of biologically 
hallux to femur head diameter ratios (natural logarithm). Box plots show the independent animals are reported in parentheses after the species names. For 


centre line (median), box limits (upper and lower quartiles), cross (arithmetic raw data, see Supplementary Tables 7, 22. 


Extended Data Fig. 10|D. guggenmosi, maxillary sinus and enamel 
thickness. a, Left maxilla with three-dimensional rendering of molar roots and 
maxillary sinus (blue) in lingual (left), anterior (middle) and occlusal (right) 
views. Sinus runs deep between the posterobuccal and lingual roots of M?, 
rising anteriorly (dashed black line). Laterally the sinus extends deep into the 


zygomatic root (dashed white line). b, c, Enamel thickness measured on right 
M, (GPIT/MA 10000-03). Computed tomography image of the cross-section at 
distal sectional plane (b) and graphical conversion (c; grey, enamel; dark grey; 
dentine). 
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When statistical analyses are reported, confirm that the following items are present in the relevant location (e.g. figure legend, table legend, main 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


An indication of whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
1 Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistics including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND 
Lo variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
“— Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


ml Clearly defined error bars 
State explicitly what error bars represent (e.g. SD, SE, Cl) 


Our web collection on statistics for biologists may be useful. 


Software and code 


Policy information about availability of computer code 
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Universitat Erlangen-Nurnberg, Germany) and a YXLON FF35 CT scanner at YXLON Inspection Service facility ( Heidelberg / Germany). 
Comparative data of extant species were collected by using an Artec Space Spider surface scanner and Artec Studio software (versions 
11-14). 


Data analysis For the micro CT-scan data analysis, we used Avizo 9.0 (ThermoFisher Scientific) and Geomagic Wrap 2017 (3D Systems Software) for the 
virtual reconstruction of longbones. 3D-prints were generated with Z-Suite 2.11 and printed on a Zortrax M200 FDM printer. Lateral tibia 
ellipse estimates we obtained using the Mathlab function “EllipseDirectFit” of Nikolai Chernov available from mathworks.com 
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- A description of any restrictions on data availability 


All data generated or analysed during this study are included in the published article (and its supplementary information files). The CT-scans analysed during the 
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Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/authors/policies/ReportingSummary-flat.pdf 


Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description Morphologic description and functional interpretation of fossil hominid specimens. 


Research sample The research sample consists of 36 original fossil hominid bones/teeth from Hammerschmiede. The extant primates samples for 
skeletal comparison consists of about 350 adult and non-captive individuals of cercopithecids and hominids of both sexes. 


Sampling strategy o sample size calculation was performed. The sample size of fossils is limited by availability. The size of extant comparative samples 
primates) varies between 10 and 60 individuals, which is a normal size in primatological anatomic comparisons. 


Data collection Data from the original fossil specimens were collected by M.B, D.R.B., N.S. and J.F. Micro-CT and surface scan data processing and 
collection was conducted by J.F. and A.T., in collaboration with A.S.D., U.K., T.L. and J.P. 


Timing and spatial scale Data collection started in spring 2018, followed by comparative data collection from summer 2018 to summer 2019. 


Data exclusions o data was excluded from the analysis. 
Reproducibility not applicable 

Randomization not applicable 

Blinding not applicable 

Did the study involve field work? Yes No 


Field work, collection and transport 


Field conditions not applicable for palaeontological excavations 


Location Hammerschmiede, Allgau, Bavaria, southern Germany; ), coordinates N 47° 55’ 38.5”, E 10° 35.5’; fluvial channel of level HAM 5 
at stratigraphic meter 12 in the local section, 685 m above sea level 


Access and import/export According to German (Bavarian) law no permissions needed for palaeontological excavations. Permission from the land owner 
have been obtained. 


Disturbance No disturbance (active mining pit) 


Reporting for specific materials, systems and methods 


aterials & experimental systems ethods 
n/a | Involved in the study n/a | Involved in the study 
Unique biological materials ChIP-seq 
Antibodies Flow cytometry 
Eukaryotic cell lines MRI-based neuroimaging 


Palaeontology 


Animals and other organisms 


Human research participants 


Palaeontology 


Specimen provenance See above: According to German (Bavarian) law no permissions needed for palaeontological excavations. Non-formal permission 
from the land owner have been obtained. 
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Specimen deposition All Hammerschmiede fossils are stored in the paleontological collection of the University of Tubingen (acronym GPIT), a research 
infrastructure of the Senckenberg Institute for Human Evolution and Palaeoenvironment (SHEP) Tubingen. 


Dating methods No new dates are provided. 


Tick this box to confirm that the raw and calibrated dates are available in the paper or in Supplementary Information. 
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In rapidly adapting asexual populations, including many microbial pathogens 

and viruses, numerous mutant lineages often compete for dominance within the 
population’ °. These complex evolutionary dynamics determine the outcomes of 
adaptation, but have been difficult to observe directly. Previous studies have used 
whole-genome sequencing to follow molecular adaptation® °; however, these 
methods have limited resolution in microbial populations. Here we introducea 
renewable barcoding system to observe evolutionary dynamics at high resolution in 
laboratory budding yeast. We find nested patterns of interference and hitchhiking 
even at low frequencies. These events are driven by the continuous appearance of new 
mutations that modify the fates of existing lineages before they reach substantial 
frequencies. We observe how the distribution of fitness within the population 
changes over time, and find a travelling wave of adaptation that has been predicted 

by theory” ”. We show that clonal competition creates a dynamical ‘rich-get-richer’ 
effect: fitness advantages that are acquired early in evolution drive clonal expansions, 
which increase the chances of acquiring future mutations. However, less-fit lineages 
also routinely leapfrog over strains of higher fitness. Our results demonstrate that this 
combination of factors, which is not accounted for in existing models of evolutionary 


dynamics, is critical in determining the rate, predictability and molecular basis of 


adaptation. 


Rapidly adapting populations have complex evolutionary dynamics. 
Inthese systems, adaptation is not limited by the supply of mutations”. 
Instead, numerous beneficial mutations arise simultaneously and drive 
competing clonal expansions’ ’, often accompanied by neutral and 
deleterious hitchhiker mutations®”’. Studies have shown that this is 
the dominant mode of adaptation in many bacterial and viral patho- 
gens””*, as well as in the somatic evolution of cancer”? and immune 
repertoires”. In these contexts, clonal interference and hitchhiking 
have important consequences for the pace, outcomes and repeat- 
ability of evolution. 

This mode of rapid adaptation cannot be described by classical 
evolutionary theory, because the fate of each mutation cannot 
be modelled in isolation"*”*. Instead, selection acts on physically 
linked combinations of alleles, which leads to complex co-dependency 
between the fates of different mutations. This limits the efficiency 
of selection and renders evolution less predictable: strongly 
beneficial mutations can be outcompeted, whereas deleterious 
mutations in good genetic backgrounds can spread through the 
population®”*””, 


Numerous studies have used whole-genome sequencing to inves- 
tigate these effects in laboratory microbial populations®°, and have 
shown that clonal interference and hitchhiking are widespread. 
However, limitations on sequencing depth make it impractical to 
achieve a frequency resolution of higher than a few per cent, and bar- 
coding-based methods>”*” that offer better resolution are limited 
to short timescales. These limitations are critical in large microbial 
populations, in which theory suggests that the fates of mutations are 
often determined over long timescales by competition and hitchhiking 
among rare high-fitness lineages, and that the vast majority of driver 
mutations never reach detectable frequencies” ”. 


Arenewable barcoding system 


Here, we develop a renewable barcoding approach to observe evolu- 
tionary dynamics at high resolution on long timescales, by periodi- 
cally adding new barcodes to split each tracked lineage into labelled 
‘sublineages’ (Fig. 1a). Our approach uses three orthogonal lox sites: 
Cre-mediated recombination occurs between sites of the same type, 
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Fig. 1| Renewable barcoding system and lineage dynamics. a, Experimental 
design. Diverse DNA barcodes are introduced into an initially clonal 
population, with each barcode labelling a small lineage. Every 100 generations 
(gens), we introduce new diverse barcodes immediately adjacent to existing 
barcodes, thereby subdividing each lineage into sublineages. b, Renewable 
barcoding system. The Cre-lox system consists of three orthogonal lox sites 
(coloured triangles), each of which can be modified with two arm disruptions 
(red) that are individually tolerated but jointly inactivating (Supplementary 
Information section1). At each barcode addition, we combine arm disruptions 
to inactivate the old lox site, while adding a new orthogonal active /ox site. 
Alternating lox orientations further limit undesired recombination. Drug 
markers (Kan, kanamycin (G418) resistance; Hyg, hygromycin B resistance) 
contain an intron 3’ splice-accepting site and must correctly integrate at the 
landing pad that contains the 5’ splice donor to be functional.c, When the 
barcode locus exceeds the length of an Illumina read, we use custom priming 


but not between orthogonal types. Each site can be inactivated by two 
specific arm disruptions (one in each of the two Cre-binding regions), 
but retains high activity with only one disruption. We used this system to 
design barcoded plasmid libraries with complementary Cre-—lox archi- 
tecture, which we use to integrate barcodes at a designated genomic 
‘landing pad’ locus (Fig. 1b, Supplementary Information section 1). 
At each barcode addition, Cre-mediated recombination combines arm 
disruptions to inactivate an old lox site, and adds a new orthogonal lox 
site with a single arm disruption to be used for the next barcode addition 
with a complementary plasmid library (Supplementary Information 
section 1). Each plasmid also contains an inactive drug marker that 
lacks astart codon; correct integration activates this marker by combin- 
ing it with a start codon in the landing pad, separated by an artificial 
intron. 


= 0) 
where FDR(a) < 5% 


ia 
Group sublineages 
of indistinguishable 
fitness into clones 


sites to sequence overlapping sets of four consecutive barcodes. After 
exploiting barcode diversity to identify and correct sequencing errors, we use 
these overlaps to unambiguously reconstruct the full barcode locus 
(Supplementary Information section 2). d, Inference pipeline. Left, raw 
barcode frequencies over time (left to right; colours chosen at random). For 
legibility, we only show lineages or sublineages with a frequency that exceeds 
0.1% in at least one time point. Combined frequencies of lineages that donot 
individually reach 0.1% are shown as white space (or the colour of the parent 
when that parent exceeds a frequency of 0.1%). Middle, summary of the model 
used for identifying selected lineages (see Supplementary Information 
section 4 for details. In brief, we use the data to construct a parametric model 
for the strength of noise from genetic drift and sequencing and discard 
trajectories that are explained by noise alone, at a false discovery rate (FDR) of 
5%. We then jointly infer the fitness of all remaining lineages and group lineages 
of indistinguishable fitness into clones (right). 


This system integrates new DNA barcodes immediately downstream 
of existing barcodes. Each individual thus acquires a string of barcodes 
that encode its ancestry, which can be read by sequencing. We read four 
barcodes per 150-bp paired-end Illumina read; when the barcode locus 
exceeds this length, we exploit overlapping fragments to assemble the 
complete locus (after using high barcode diversity to correct sequenc- 
ing errors) (Fig. 1c, Supplementary Information sections 1.5, 2). This 
allows us to track the frequencies of all lineages and sublineages and 
hence trace the ancestry of the entire population. 


Lineage tracking in evolving populations 


We used this system to evolve two diploid yeast populations founded 
from identical clonal ancestors, each labelled with about 50,000 diverse 
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Fig. 2 | Inferred clonal dynamics. a, b, Muller diagrams showing dynamics of 
inferred beneficial mutations in YPD (a) and YPA (b) populations. Timeis 
expressed in terms of epoch and generation (for example, 4.100 refers to 
generation 100 of epoch 4). Stars denote the establishment epoch of each new 
beneficial mutation (see Supplementary Information section 5). The opacity of 
the colours denotes the fitness of the corresponding lineage; mutant lineages 


barcodes. We maintained both populations in batch culture, witha 
1:1,024 dilution every 24 h (10 generations per day with a bottleneck 
of about 500,000 cells; an effective size (N.) of 5 x 10°). An aliquot was 
frozen daily for analysis (Supplementary Information section 1.4). 
One population was maintained in rich medium (YPD) and the other 
in rich medium with 0.3% acetic acid (YPA), which leads to intracel- 
lular acidification that pilot studies have suggested leads to stronger 
selection pressures*’. In studying these populations, our goal was to 
identify generic features of the evolutionary dynamics rather than 
details of differences between conditions. Our choice of environments 
maintains consistency with previous work, which indicates that these 
environments lead to rapid adaptation involving rich dynamics that 
could not be observed using earlier approaches> ®. 

We re-barcoded each population every 100 generations with 
about 50,000 additional unique barcodes. This diversity was cho- 
sen to ensure that barcoding does not introduce a substantial bot- 
tleneck; at 10% of the daily bottleneck every 10 days, it does not 
change the scale of genetic drift or the effective population size 
(Supplementary Information section 4.4). It also ensures that we can 
detect relevant selection pressures that act on lineages once those 
lineages become large enough that their dynamics are not domi- 
nated by drift (Supplementary Information section 4.4). However, 
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that did not acquire additional beneficial mutations are grey. Grey bars denote 
barcoding intervals.c,d, Muller diagrams showing within-lineage dynamics in 
select lineages inthe YPD (c) and YPA (d) populations. Colours are consistent 
with corresponding lineages in a and b. White space indicates periods during 
which the selected lineage was not observed. 


we note that although our barcoding procedure is designed to be 
minimally perturbative, it does involve propagation and selec- 
tion steps. Thus, strictly speaking we are studying evolution ina 
fluctuating environment that alternates between ‘evolution’ and 
‘barcoding’ conditions—although, as we see below, the role of these 
fluctuations is minor. 

After 1,000 generations of evolution (ten 100-generation ‘epochs’), 
we sequenced the barcode locus at a depth of around 10° reads in every 
frozentime point. This yielded 110 sequenced time points per popula- 
tion (11time points per epoch at 10-generation intervals, although we 
excluded the final epoch of the YPD population owing to barcoding 
failure; see Supplementary Information section 1.4). We use this data 
to infer which lineages contain mutations that are beneficial in either 
evolution or barcoding conditions, and we exploit phylogeny to infer 
in which epoch each mutation was established (that is, within 100-gen- 
eration resolution; Fig. 1d, Supplementary Information section 5). 
This allows us to group barcodes into ‘clones’, each founded by anew 
mutation. We then jointly infer the fitness effects of all mutations in 
evolution and barcoding conditions. Because we barcode frequently, 
the dynamics are determined by the average fitness across the two 
conditions (Supplementary Information section 6.2). We therefore use 
this average fitness for the analysis below (although simply neglecting 
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Fig. 3 | Travelling wave dynamics. a, b, Inferred distribution of fitness within 
the YPD (a) and YPA (b) populations over time. All fitness are the average across 
evolution and barcoding conditions (Supplementary Information 6.2). Each 
coloured bar denotes the frequency and fitness of acorresponding lineage in 
Fig. 2. White bars correspond to the ancestor. Black lines denote the inferred 
mean fitness of the population. c, d, Genealogical relationships among 


the barcoding environment leads to qualitatively similar conclusions; 
see Supplementary Information section 6). 

Our ability to detect mutations is limited primarily by genetic drift. 
We cannot identify mutations until they are common enough that 
their fitness effects lead to frequency changes larger than this 
stochastic force (which typically corresponds to lineages at frequencies 
greater than 10“). Because fitness inference requires sufficient time- 
course data, we are also unable to detect most mutations that arise 
in the final 100-200 generations of the experiment (Supplementary 
Information section 5.3). Our analysis thus only identifies a subset of 
beneficial mutations, and our clones are clonal only with respect to 
these. 

We find that in both populations, many beneficial mutations arise 
early inthe experiment, founding clones that compete for dominance 
(Fig. 2a, b). Some of these clones diversify through further beneficial 
mutations, and a handful obtain multiple mutations, which interfere 
with one another within the parent clone (Fig. 2c, d). In some cases we 
observe multiple nested interference events (Fig. 2c, d). All but the 
largest of these events are undetectable by metagenomic sequencing 
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lineages in the YPD (c) and YPA (d) populations show frequent leapfrogging 
events. Each clonal lineage is shown at its corresponding fitness. The opacity of 
the colours indicates the frequency of the lineage. Colours of the lineages 
shown in Fig. 2c, dare consistent with that figure; all other lineages are grey. 
Mutational events within highlighted lineages are shown as arrows; each event 
arises in one clonal lineage and founds a new lineage at anew fitness. 


at approximately 25x depth (which corresponds to about the same 
total number of sequencing reads as our barcode data; Extended Data 
Fig. 1a, b). 

We can also visualize how the fitness composition of the popula- 
tion changes over time (Fig. 3). The population initially diversifies as 
numerous beneficial mutations arise on the ancestral background, 
creating a distribution of fitness within the population (Fig. 3a, b). 
As these clones expand, the mean fitness of the population increases 
(Fig. 3, Extended Data Fig. 2), causing less-fit lineages to fall behind and 
begin to die out. However, diversity is maintained by new beneficial 
mutations, which continuously create clones of even higher fitness 
(Fig. 3c, d). This maintenance of diversity in the face of strong selection 
is an expected feature of rapid adaptation that has been predicted by 
theory" ” but not previously observed directly. 


Determinants of lineage success 


These dynamics lead to a complex picture of the determinants of suc- 
cess of individual lineages. In the absence of further mutations, the 
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Fig. 4 | Travelling wave dynamics and factors determining the success of 
mutant lineages. a, Relationship between initial within-population fitness 
rank of a mutation that arises in the ancestral background and its maximum 
frequency in the second half of the experiment (using the second half avoids 
confounding axes in b).n=35 and n=47 unique lineages in YPD and YPA 
respectively. Dots represent the mean, and lines show the range of maximum 
frequencies in each founding fitness quantile. b, Relationship between the 
number of subsequent beneficial mutations landing on the founding clonal 
background ofa lineage (in the first half of the experiment) and the eventual 
maximum frequency of that lineage (in the second half of the experiment). c, 


fitness of a lineage should be the only predictor of its success. Yet we 
find that the initial fitness of a mutant lineage is only a modest predictor 
ofits fate (Fig. 4a). Another key factor is whether a lineage acquires fur- 
ther beneficial mutations (Fig. 4b). Although this is influenced by fitness 
(see below), even high-fitness lineages that do not acquire further ben- 
eficial mutations are readily outcompeted, and lower-fitness lineages 
that acquire multiple mutations can succeed (Extended Data Fig. 3). The 
likelihood ofa lineage acquiring further beneficial mutations isinturn 
affected by two main factors (Fig. 4c). First, larger lineages have more 
opportunities to acquire beneficial mutations. Second, the fitness ofa 
lineage has acritical role: mutations that arise in a highly fit and hence 
rapidly expanding lineage will be less likely to be lost to genetic drift. 
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Effect of lineage frequency and fitness onthe likelihood of acquiring additional 
beneficial mutations. Each point represents the mean frequency and fitness of 
alineage ina given 100-generation epoch; symbol size denotes how many 
additional beneficial mutations that lineage acquired (numbers indicate 
lineages that acquire more than four). d, Histograms of effect sizes of all 
inferred mutations. e, Effect sizes of mutations arising on parental 
backgrounds asa function of mean parental relative fitness inthe epochin 
which each mutation arose. The region below the grey line corresponds to 
mutations that would create lineages less fit than the current mean fitness. 


Thus, highly fit backgrounds can accumulate beneficial mutations of 
both strong and weak effect, whereas only rare strong mutations can 
establish on lower-fitness backgrounds. Consistent with this, our data 
showthat high-fitness backgrounds acquire both weakly and strongly 
beneficial mutations, but low-fitness backgrounds only acquire the 
latter (Fig. 4d, e). This means that fitter backgrounds have access to a 
larger number of beneficial mutations, creating arich-get-richer effect 
that can lead to bursts of mutations at the expanding front of the fit- 
ness wave. These bursts arise owing to dynamical considerations, and 
are not inthemselves evidence of historical contingency asa result of 
mutator phenotypes or other modifiers of adaptability (Supplementary 
Information section 6.5). 


These results are qualitatively consistent with recent theory that sug- 
gests that rapidly evolving populations can be described by ‘travelling 
wave’ models" ™. Inthis picture, mutations continuously generate vari- 
ation in fitness while selection destroys it by eliminating less-fit geno- 
types, leading to a broad distribution of fitness around an increasing 
mean (a fitness wave). However, these models have only been analysed 
in parameter regimes in which the future common ancestor of a popu- 
lation is always one of the fittest lineages (although see one previous 
study that discusses scenarios in which this can be violated”). Instead, 
clonal competition in our experiment is characterized by routine ‘leap- 
frogging’ events, in which lineages of initially low relative fitness acquire 
strong beneficial mutations that pull them to prominence, causing 
dramatic reversals of fate. For example, in the YPD population (Fig. 3c) 
the green lineage—which is the fittest at the start—is leapfrogged by 
the orange, blue and purple backgrounds; the blue lineage then falls 
behind, only to later leapfrog all others. Similarly, inthe YPA population 
(Fig. 3d), the brown lineage appears to outcompete the turquoise, red 
and yellow lineages, only to be leapfrogged by two strongly beneficial 
mutations in a red lineage that is initially much less fit (replay experi- 
ments validate this event; see Supplementary Information section 6.3). 

Leapfrogging events not only alter the fates of individual lineages, 
but also cause fluctuations in the fitness distribution and modulate 
the pace of adaptation. Both within-population fitness and genetic 
variation increase during initial diversification before reaching a pla- 
teau as a travelling wave is established (Fig. 3, Extended Data Fig. 4). 
However, leapfrogging can cause fluctuations in this travelling wave: 
the creation of a lineage with anomalously high fitness can lead toa 
reduction in diversity at first, but at the same time enable rapid further 
diversification within this lineage that later re-establishes variation 
(Fig. 3c, d, Extended Data Fig. 4). These fluctuations affect the success 
of any individual mutation and the dynamics of the travelling wave, 
and hence havea major role in determining the outcomes of evolution. 


Discussion 


Previous theory has assumed that the effects of leapfrogging and fluc- 
tuations are occasional perturbations that can be largely ignored” ”. 
Our results suggest that they instead have a central role. Although 
our system involves microbial populations of modest size, the impor- 
tance of these effects is expected to depend only weakly on popula- 
tion size and mutation rate (because relevant timescales only depend 
logarithmically on these quantities”). Thus our results suggest that 
leapfrogging and fluctuations may be routine in the evolution of a wide 
range of microorganisms and viruses. A new theoretical framework is 
essential to develop accurate models of evolution in these systems. 
The renewable barcoding approach we have introduced here offers the 
potential to test these models, and to observe evolutionary dynamics 
ina variety of contexts at sufficient resolution to investigate the role 
of other factors suchas frequency-dependent selection or mutations 
that alter the adaptability of individual lineages. 
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Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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Extended Data Fig. 2| Comparison of inferred and measured population 
mean fitness trajectories. All fitness measurements and inferences refer to 
the evolution environment only. Trajectories have been offset to agree with the 
fitness assay at time point 3.100. Dots denote barcoding intervals. Shaded 
regions around the trajectories denote estimates of 95% confidence intervals 
for the inferred mean fitness trajectory, which often do not exceed the width of 
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the lines (Supplementary Information section 6.1). Inthe case of the YPA 
population, lighter colours denote mean fitness trajectories over the last two 
epochs, offset to agree with fitness assays in the last time point (see 
Supplementary Information section 6.6 for a discussion of potential reasons 
for these discrepancies) FACS, fluorescence-activated cell sorting. 
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Extended Data Fig. 3 | Predictors of the success of lineages. The size of each 
dot denotes the number of later beneficial mutations that occur inthe 
founding clonal background ofa lineage (in the first half of the experiment). 
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Extended Data Fig. 4 | Genetic variation over time. a, Totalnumberoflineages Informationsection6.4).c, Variance in fitness over time. d, Fitness diversity 
above a threshold frequency (0.01%) over time. Bars denote the number of new within each population over time, as measured by fitness entropy. Fitness 
lineages that arise ineach100-generation interval. b, Genetic diversity within entropy quantifies how fitness variance is distributed among lineages 

each population over time, as measured by entropy (Supplementary (Supplementary Information section 6.4). 
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One of the most abundant sources of organic carbon in the ocean is glycolate, the 
secretion of which by marine phytoplankton results in an estimated annual flux of one 
petagram of glycolate in marine environments’. Although it is generally accepted that 
glycolate is oxidized to glyoxylate by marine bacteria? *, the further fate of this C, 
metabolite is not well understood. Here we show that ubiquitous marine 
Proteobacteria are able to assimilate glyoxylate via the 8-hydroxyaspartate cycle 
(BHAC) that was originally proposed 56 years ago”. We elucidate the biochemistry of 
the BHAC and describe the structure of its key enzymes, including a previously 
unknown primary imine reductase. Overall, the BHAC enables the direct production 
of oxaloacetate from glyoxylate through only four enzymatic steps, representing—to 
our knowledge—the most efficient glyoxylate assimilation route described to date. 
Analysis of marine metagenomes shows that the BHAC is globally distributed and on 
average 20-fold more abundant than the glycerate pathway, the only other known 
pathway for net glyoxylate assimilation. Ina field study of a phytoplankton bloom, we 
show that glycolate is present in high nanomolar concentrations and taken up by 
prokaryotes at rates that allow a full turnover of the glycolate pool within one week. 
During the bloom, genes that encode BHAC key enzymes are present in up to 1.5% of 
the bacterial community and actively transcribed, supporting the role of the BHAC in 
glycolate assimilation and suggesting a previously undescribed trophic interaction 
between autotrophic phytoplankton and heterotrophic bacterioplankton. 


Global net primary production has been estimated to be approximately 
100 petagrams of carbon per year, equal parts of which are produced 
in terrestrial and marine habitats®. In the oceans, more thana third of 
primary production can be released into the water column by phyto- 
plankton as dissolved organic carbon’, generating a plethora of sub- 
strates for heterotrophic bacterioplankton. An abundant component 
of the pool of dissolved organic carbon is the carboxylic acid glyco- 
late, which is released as a photorespiratory waste product of marine 
autotrophs?*”. Concentrations of glycolate in the nanomolar-to-low 
micromolar range have been measured in different marine habitats!710™ 
(Extended Data Fig. 1), and the compound is readily taken up by bac- 
terioplankton”. The first step in glycolate metabolism is its oxidation 
to glyoxylate, which is catalysed by the enzyme glycolate oxidase. The 
abundance andtranscription of the glcD gene, which encodes a subunit 
of glycolate oxidase, has previously been used to investigate bacterial 
groups that are capable of glycolate utilization*"*. However, it has been 
assumed that glycolate is the subject of bacterial oxidation mainly to 
conserve energy” *; the further fate of glyoxylate has not been described 
in detail. For SAR11 bacteria, it has been shown that glyoxylate can be 
used to replace the obligate glycine requirement“. In SAR11 and other 


bacteria, glyoxylate can be co-assimilated by malate synthase into the 
tricarboxylic acid cycle” or directly assimilated into central carbon 
metabolism through the well-studied glycerate pathway”””®. An alter- 
native solution is the BHAC*”, which has been previously proposed 
to operate in the Alphaproteobacterium Paracoccus denitrificans”. 
However, the complete reaction sequence and the proteins comprising 
this pathway and their detailed biochemistry have remained unknown 
for the past 56 years. 

On the basis of the sequence of a putative B-hydroxyaspartate aldo- 
lase gene (dhaa; GenBank accession number ABO75600) from P. deni- 
trificans IFO 13301”, we identified a homologue in the genome of P. 
denitrificans DSM413 (BLT64_RS06500), annotated as a DSD1 family 
pyridoxal 5-phosphate (PLP)-dependent enzyme. This gene is part of 
a gene cluster, which consists of four structural genes and a putative 
transcriptional regulator that we termed bhcABCD and bhcr (Fig. 1a). 
In addition to the gene for the putative B-hydroxyaspartate aldolase 
(bhcC), the cluster comprises the open reading frames that encode a 
putative PLP-dependent aminotransferase (BLT64_RSO6510, bhcA),a 
putative serine/threonine dehydratase (BLT64_RSO6505, bhcB) and 
a putative ornithine cyclodeaminase (BLT64_RS06495, bhcD). The 
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Fig. 1| The BHAC. a, Genetic structure of the bhc gene cluster in P. denitrificans 
DSM 413. b, Reaction sequence and net balance of the BHAC. c, Cartoon 
representation of the B-hydroxyaspartate aldolase (BhcC) homodimer with 
superimposed protein surface (PDB 6QKB).d, Cartoon representation of the 
iminosuccinate reductase (BhcD) homodimer with superimposed protein 
surface (PDB 6RQA). 


putative transcriptional regulator (BLT64_RS06515), annotated as 
IcIR-family regulator, is located in the opposite orientation to the four 
structural genes. 

We expressed and characterized the four enzymes that are encoded 
in the gene cluster. BhcA is a PLP-dependent aminotransferase that 
transaminates glyoxylate into glycine using aspartate as the preferred 
amino group donor. BhcB functions as a B-hydroxyaspartate dehy- 
dratase. BhcC is a B-hydroxyaspartate aldolase, the key enzyme of the 
BHAC that catalyses the condensation of glyoxylate and glycine into 
B-hydroxyaspartate. This enzymeis closely related to D-threonine aldo- 
lases (Extended Data Fig. 2). The crystal structure of B-hydroxyaspartate 
aldolase that we solved at 1.7 A (Protein Data Bank (PDB) 6QKB) shows 
that the three amino acids A160, A195 and S313 distinguish the active 
site of BhcC from that of D-threonine aldolases, providing a signature 
sequence for this enzyme family (Fig. 1c, Extended Data Fig. 2 and 
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Fig. 2| Reaction sequence catalysed by B-hydroxyaspartate dehydratase 
(BhcB) and iminosuccinate reductase (BhcD). a, Overview of the relevant 
reactions. b, Production of L-aspartate (red) from (2R, 3S)-B-hydroxyaspartate 
(green) by BhcB and reduction of iminosuccinate by BhcD. c, Production of 
(mono)deuterated aspartate (grey) from (2R, 3S)-B-hydroxyaspartate (green) 
by BhcB and reduction of iminosuccinate via NaBH,CN in D,O. The data 
represent the formation of monodeuterated aspartate; owing to proton 
exchange, di- and trideuterated aspartate can also be formed in small 
quantities. d, Production of oxaloacetate (white) from (2R, 3S)-B- 
hydroxyaspartate (green) by BhcB and subsequent hydrolysis of 
iminosuccinate when neither BhcD nor NaBH;CNare added. b-d, Dataare 
mean +s.d.;n=3 independent experiments. 


Extended Data Table 1). When combined, the BhcABC proteins were 
sufficient to reconstruct a reaction sequence from aspartate and two 
molecules of glyoxylate to two molecules of oxaloacetate and free 
ammonia. However, this left us puzzled about the function of the fourth 
open reading frame, the putative ornithine cyclodeaminase (bhcD). 
When we tested BhcD in combination with BhcB, we discovered 
that it functions as an imine reductase (IRED) that accepts a labile 
iminosuccinate intermediate” formed by the latter enzyme (Fig. 2a, 
b). We used sodium cyanoborohydride trapping to demonstrate that 
BhcB produces iminosuccinate (Fig. 2c). Although this compound 
spontaneously decays into free ammonia and oxaloacetate in solution 
(Fig. 2d), iminosuccinate is reduced to L-aspartate in the presence of 
BhcD, thereby regenerating the amino group donor for the first step of 
the BHAC. IREDs are extensively investigated owing to their biotechno- 
logical potential**. Almost all IREDs described to date act on secondary 
imines, whereas the reduction of a free primary imine—as catalysed by 
BhcD—has not previously been described. The enzymatic reduction 
of primary imines is known only as part of the reaction sequence in 
glutamate dehydrogenase* and as part of anon-physiological side reac- 
tion of ketimine reductases”. The crystal structure of BhcD, which we 
solved toa resolution of 2.6 A (PDB 6RQA), shows major differences in 
the active site compared to L-alanine dehydrogenase from Archaeoglo- 
bus fulgidus (PDB10MO), the closest structural homologue within the 
ornithine cyclodeaminase/p-crystalline enzyme superfamily (Fig. 1d, 
Extended Data Fig. 3 and Extended Data Table 1). Phylogenetic analysis 
supports these active site differences and reveals that BhcD and its 
homologues constitute a novel family of primary IREDs within the orni- 
thine cyclodeaminase/1-crystalline superfamily (Extended Data Fig. 3). 
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Table 1| Kinetic parameters of the four enzymes of the BHAC 


Enzyme Substrate kat (8) App. Ky (mM) Kat! Ky (M™s“) 
Aspartate-glyoxylate aminotransferase Glyoxylate 58 +1 0.43 + 0.02 1.34 10° 
(BhcA) L-Aspartate 56+1 2.514010 2.25 x 104 
Glycine 0.76 + 0.01 9.52 + 0.40 7.97 x10" 
Oxaloacetate 0.76 + 0.02 2.90 +0.27 2.62 x 10? 
L-Serine 8.8+0.3 210 + 0.24 4.20 x 10° 
L-Glutamate 5.0+0.3 20.62 + 2.33 2.44 x10? 
B-Hydroxyaspartate dehydratase (BhcB) (2R, 3S)-B-Hydroxyaspartate 3541 0.20 + 0.02 1.75 x10° 
B-Hydroxyaspartate aldolase (BhcC) Glyoxylate 86+4 0.23 + 0.03 3.72 x10° 
Glycine 9142 4.31+0.34 2.11 «104 
(2R, 3S)-B-Hydroxyaspartate 3341 0.28 + 0.03 1.18 x 105 
D-Threonine 76+2 9.24 + 0.86 8.25 x 10° 
Iminosuccinate reductase (BhcD) Iminosuccinate 201+10 0.09 + 0.01 2.29 x 10° 
NADH - 0.02 + 0.003 - 
NADPH - 0.33 + 0.05 - 


Data are mean + s.d., as determined from nonlinear fits of 18 data points with GraphPad Prism 8. Michaelis-Menten fits of enzyme kinetics and an SDS-PAGE gel showing purified proteins are 
provided in Extended Data Fig. 4 and Supplementary Fig. 1, respectively. For BhcA, kinetics for glyoxylate and L-aspartate were measured with 20 mM L-aspartate and 5 mM glyoxylate, respec- 
tively, and kinetics for glycine and oxaloacetate were measured with 20 mM oxaloacetate and 30 mM glycine, respectively. Kinetics for L-serine and L-glutamate were measured with 5 mM 
glyoxylate. For BhcC, kinetics for glycine and glyoxylate were measured with 5 mM glyoxylate and 20 mM glycine, respectively. 


The kinetic parameters of all enzymes of the BHAC are reported in 
Table 1. The complete reaction sequence of the pathway is shown in 
Fig. 1b. The cycle extends the originally proposed reaction sequence’ by 
the IRED reaction. Overall, the BHAC converts two molecules of glyoxy- 
late (C,) into oxaloacetate (C,) without the loss of carbon as CO,, under 
consumption of just one reducing equivalent and regeneration of the 
catalytic amino donor, which makes it one of the most efficient glyoxy- 
late assimilation pathways described to date (Supplementary Table 1). 
Oxaloacetate formed in the BHAC can directly enter the tricarboxylic 
acid cycle or serve as substrate for anabolic reactions. The pathway is 
essential for the growth of P. denitrificans in the presence of glycolate 
and glyoxylate, and its enzymes are highly expressed and active in cells 
grown in the presence of glycolate (Extended Data Fig. 5). Glyoxylate 
negatively affected the interaction of the transcriptional regulator BhcR 
with the promoter region of the bhc gene cluster (Extended Data Fig. 5). 

We next studied the phylogenetic distribution of the BHAC. The bhc 
gene cluster is widespread among the Rhizobiales and Rhodobacte- 
rales orders of the Alphaproteobacteria, and is also found in several 
gammaproteobacterial orders (Extended Data Fig. 6). Most of these 
bacteria were isolated from marine habitats, and the Roseobacter group 
within the Rhodobacterales is one of the three major bacterial groups 
responding to phytoplankton blooms”; Roseobacter-group bacteria 
can constitute up to 15% of the bacterial community in these blooms”’. 
Notably, 94% of the isolates with the bhc gene cluster also encode gly- 
colate oxidase in their genomes, enabling them to oxidize glycolate to 
glyoxylate for subsequent assimilation by the BHAC (Supplementary 
Data 1). BhcC is also ubiquitously present in marine metagenomes 
collected on the Tara Oceans expedition (Extended Data Fig. 7 and 
Supplementary Data 2), suggesting that the BHAC functions in glycolate 
assimilation in marine environments worldwide. Notably, the BHAC 
(represented by BhcC) is on average 20-fold more abundant than the 
glycerate pathway (represented by Gcl) in these datasets (Extended 
Data Fig. 7d). 

To investigate the ecological importance of the BHAC, we focused our 
analyses on Helgoland (Extended Data Fig. 8a, b), anisland inthe North 
Sea that has already been used extensively as a study site to investigate 
the succession of bacterial populations during algal blooms”®””. We 
analysed metagenomes from seawater samples collected between 2010 
and 2012 at Helgoland and detected the bhc gene cluster inall years at 
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intermediate abundances (up to 3 reads per kilobase per million reads 
(RPKM), corresponding to roughly 1.5% of all cells”*) (Extended Data 
Fig. 8c-e and Supplementary Data 3). To further investigate the role 
of the BHAC in situ, we monitored the spring phytoplankton bloom 
at Helgoland from March to May 2018. We determined chlorophyll 
a (Chl a) fluorescence as proxy for phytoplankton biomass and total 
microbial cell counts for each working day. Glycolate concentrations 
in the seawater were determined weekly. 

The 2018 spring bloom was dominated by pennate diatoms and 
consisted of two peaks in phytoplankton growth in late April and late 
May (Fig. 3a). We determined a background concentration of glyco- 
late in the seawater of 300 nM before the bloom, which is in line with 
previous measurements!**? "3°? (600 + 340 nM) (Extended Data 
Fig. 1). During the bloom, from early March to late May, glycolate con- 
centrations increased by approximately 350 nM (Fig. 3b), indicating 
the accumulation of phytoplankton-derived glycolate. At three time 
points in April and May, before and during the algal bloom, we deter- 
mined bulk uptake rates of glycolate in the sea water. Glycolate uptake 
rates were in line with previously reported values” and increased more 
than threefold from1.46nM h'to 4.68 nM h' between the first and the 
last measurement (Fig. 3b), indicating that the capacity for glycolate 
uptake had multiplied at the same factor as the total microbial cell 
counts. Notably, these rates are comparable to uptake and consump- 
tion rates for dimethylsulfoniopropionate in the open ocean*? and 
would enable a turnover of the total glycolate pool at our sampling 
site every 5-10 days. The bhc gene cluster was prevalent during the 
progression of the phytoplankton bloom. bhcC genes were detected 
at all of the time points, with the highest abundance per cell (around 
1.5%) during the peaks of the phytoplankton bloom in April and May 
(Fig. 3c, Extended Data Fig. 9 and Supplementary Data 4). Transcription 
of bhcCwas confirmed before and during the spring bloom (Fig. 3d, e), 
indicating that the BHAC is an active route for glycolate assimilation 
in the ocean. 

In summary, our study provides the full reaction sequence and 
genetic basis of the BHAC. We demonstrate the biochemistry of the 
pathway, which involves a previously unknown family of IREDs, and 
provide support for its ecological importance in the assimilation of 
phytoplankton-derived dissolved organic carbon. The discovery of 
the BHAC as a ubiquitous pathway in marine environments adds anew 
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Fig. 3 | The BHAC during the spring phytoplankton bloom 2018 at 
Helgoland. a, From1 March to 31 May, total microbial cell counts (grey) 
and Chl aconcentrations (green) were determined each working day 
(n=1).b, The concentration of glycolate (light brown) was determined 
once per week using liquid chromatography with mass spectrometry 

and increased from approximately 300 nM to around 650 nM. The uptake 
rate of bulk glycolate was determined at three time points through 
4C-glycolate incorporation and uptake rates are indicated. Data are the 
mean +s.d. of n=5 seawater samples for glycolate concentrations, and 
of n=4 seawater samples for glycolate uptake rates. c, The bhcC gene copy 
number per cell (blue circles) was determined using qPCR. d, The bhcC 
transcript copy number per cell (blue triangles) was determined viacDNA 
synthesis followed by qPCR. e, bhcC transcript copy number divided by bhcC 
gene copy number (blue diamonds). c—e, Dataare mean+s.d.;n=3 
independent experiments. 


dimension to the biochemical cycle of glycolate, an abundant organic 
acid in the global oceans. As the BHAC requires only one reducing 
equivalent and enables carbon-conserving glycolate assimilation, it 
may confer an advantage compared to the glycerate pathway, which 
releases CO,. This may explain the high prevalence of the BHAC in 
marine Proteobacteria and could provide a starting point for future 
studies that investigate carbon fluxes from phytoplankton to hetero- 
trophic bacterioplankton. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Chemicals and reagents 
Unless otherwise stated, all chemicals and reagents were acquired from 
Sigma-Aldrich and were of the highest purity available. 


Strains, medium and cultivation conditions 

All strains used in this study are listed in Supplementary Table 2. Fscheri- 
chia coli TOP10 (for genetic work), ST18 (for plasmid conjugation) 
and BL21 Al (for protein expression) were grown at 37 °C in lysogeny 
broth (LB)*. 

P. denitrificans DSM 413” and its derivatives were grown at 30 °CinLB 
or in mineral salt medium with TE3-Zn trace elements* supplemented 
with various carbon sources. To monitor growth, the optical density at 
600 nm (OD) of culture samples was determined on a photospec- 
trometer (Merck Chemicals). 


Vector construction 

The genes encoding the four enzymes of the BHAC (bhcABCD) as well 
as the bhcR gene encoding the transcriptional regulator were cloned 
into the standard expression vector pET16b (Merck Chemicals). To this 
end, the respective genes were amplified from genomic DNA of P. deni- 
trificans DSM 413 with the primers provided in Supplementary Table 3. 
The resulting PCR products were digested with suitable restriction 
endonucleases (Thermo Fisher Scientific) as given in Supplementary 
Table 3 and ligated into the expression vector pET16b that had been 
digested with the same enzymes to create a vector for heterologous 
expression of the respective protein. Successful cloning of the desired 
open reading frames was verified by DNA sequencing (Eurofins Genom- 
ics). All plasmids used in this study are listed in Supplementary Table 2. 


Expression and purification of recombinant proteins 

For heterologous overexpression of the BhcA, BhcB, BhcC and BhcD 
enzymes, the corresponding plasmid encoding the respective enzyme 
was first transformed into chemically competent F. coli BL21 Al cells. 
The cells were then grown on LB agar plates containing 100 pg mI 
ampicillin at 37 °C overnight. A starter culturein selective LB medium 
was inoculated from a single colony on the next day and left to grow 
overnight at 37 °C ina shaking incubator. The starter culture was used 
onthe next day to inoculate an expression culture in selective terrific 
broth (TB) medium at a 1:100 dilution. The expression culture was 
grown at 37 °C ina shaking incubator to an OD go, of 0.5-0.7, induced 
with 0.5 mM isopropyl-B-D-thiogalactoside (IPTG) and 0.2% L-arabinose 
and subsequently grown overnight at 18 °Cin a shaking incubator. Cells 
were collected at 6,000g for 15 min at 4 °C and cell pellets were stored 
at -20 °C until purification of enzymes. Cell pellets were resuspended 
in twice their volume in buffer A (300 mM NaCl, 25 mM Tris-HCl pH 8.0, 
15 mM imidazole, 1 mM B-mercaptoethanol, 0.1mM MgCl, 0.01 mM 
PLP and one tablet of SIGMAFAST protease inhibitor cocktail, EDTA- 
free per litre). The cell suspension was treated with aSonopuls GM200 
sonicator (BANDELIN Electronic) at an amplitude of 50% to lyse the 
cells and subsequently centrifuged at 50,000g and 4 °C for 1h. The 
filtered supernatant (0.45-ym filter; Sarstedt) was loaded onto Pro- 
tino Ni-NTA Agarose (Macherey-Nagel) ina gravity column, which had 
previously been equilibrated with 5 column volumes of buffer A. The 
column was washed with 20 column volumes of buffer A and 5 column 
volumes of 85% buffer A and 15% buffer B and the His-tagged protein 
was eluted with buffer B (buffer A with 500 mM imidazole). The elu- 
ate was desalted using PD-10 desalting columns (GE Healthcare) and 
buffer C (100 mM NaCl, 25 mM Tris-HCl pH 8.0, 1 mM MgCl,, 0.01 mM 


PLP, 0.1mM dithiothreitol (DTT)). This was followed by purification on 
asize-exclusion column (Superdex 200 pg, HiLoad 16/600; GE Health- 
care) connected to an AKTA Pure system (GE Healthcare) using buffer 
C. The concentrated protein solution (2 ml) was injected, and the flow 
was kept constant at 1 ml min“. Elution fractions containing pure pro- 
tein were determined via SDS-PAGE analysis“ on 12.5% gels. Purified 
enzymes in buffer C were used for crystallization or stored at —20 °C 
in buffer C containing 50% glycerol for later use in enzymatic assays. 

BhcR was expressed and purified in the same way, except that buffer A 
contained 100 mM KCI, 20 mM HEPES-KOH pH 7.5, 10 mM MgCl, 4mM 
B-mercaptoethanol, 5% glycerol and one tablet of SIGMAFAST protease 
inhibitor cocktail, EDTA-free per litre. Buffer C contained 100 mM KCI, 
20 mM HEPES-KOH pH 7.5, 10 mM MgCl, 5% glycerol and 1mM DTT. 

NADH-dependent malate dehydrogenase (Mdh) and NADPH-depend- 
ent glyoxylate reductase (GhrA) from E£. coliwere overexpressed using 
the respective strains from the ASKA collection®. A starter culture in 
selective LB medium (34 pg ml chloramphenicol) was inoculated 
froma single colony and left to grow overnight at 37 °C in a shaking 
incubator. The starter culture was used on the next day to inoculate 
an expression culture in selective TB medium at a 1:100 dilution. The 
expression culture was grown at 37 °C ina shaking incubator to an 
OD, of 0.6, induced with 0.5 mM IPTG and grown another 4 h at 
37 °Cina shaking incubator. The enzymes were affinity-purified in the 
same way as described above, except that buffer A contained 200 mM 
NaCl, 50 mM potassium phosphate pH 7.0, 15 mM imidazole, 1 mM 
B-mercaptoethanol and one tablet of SIGMAFAST protease inhibitor 
cocktail, EDTA-free per litre. Buffer C contained 100 mM NaCl, 50 mM 
potassium phosphate pH 7.0 and 0.1 mM DTT. The purified enzyme 
was stored at -20 °C in buffer C containing 50% glycerol. 


Enzyme activity assays 
For all enzyme assays, the oxidation of NADH or NADPH was followed 
at 340 nm or 360 nm ona Cary 60 UV-Vis photospectrometer (Agilent) 
in quartz cuvettes with a path length of 1mm or 10 mm (Hellma Optik). 
The enzyme assay to determine the kinetic parameters of BhcA with 
glyoxylate and L-aspartate as substrates was performed at 30 °Cina 
total volume of 300 pl. The reaction mixture contained 100 mM potas- 
sium phosphate buffer pH 7.5, 0.1 mM PLP, 0.2 mM NADH, different 
amounts of the respective substrates and 32 nM BhcA. Five hundred 
nanomolar Mdh was added as a coupling enzyme to convert oxaloac- 
etate into malate. Kinetics for glyoxylate were measured with 20 mM 
L-aspartate; kinetics for L-aspartate were measured with 5 mM gly- 
oxylate. To determine the kinetic parameters with oxaloacetate and 
glycine as substrates, the same assay mixture was used and3 uMGhrA 
was added as a coupling enzyme to convert glyoxylate into glycolate. 
Kinetics for glycine were measured with 20 mM oxaloacetate; kinetics 
for oxaloacetate were measured with 30 mM glycine. To determine 
the kinetic parameters with L-serine or L-glutamate and glyoxylate as 
substrates, the same assay mixture was used and BhcB (3 pM), BhcC 
(1 uM) and Mdh (500 nM) were added as coupling enzymes. Kinetics 
for L-serine and L-glutamate were measured with 5 mM glyoxylate. 
The enzyme assay to determine the kinetic parameters of BhcB 
was performed at 30 °C ina total volume of 300 pl. The reaction mix- 
ture contained 100 mM potassium phosphate buffer pH 7.5, 0.1mM 
PLP, 0.2 mM NADH, different amounts of the substrate (2R, 3S)-B- 
hydroxyaspartate and 29 nM BhcB. Five hundred and eighty nanomo- 
lar BhcD was added as a coupling enzyme to convert iminosuccinate 
into L-aspartate. (2R, 3S)-B-Hydroxyaspartate was custom-synthesized 
by NewChem and was determined to be >95% pure by NMR analysis. 
The enzyme assay to determine the kinetic parameters of BhcC with 
glyoxylate and glycine as substrates was performed at 30 °C ina total 
volume of 300 pl. The reaction mixture contained 100 mM potassium 
phosphate buffer pH 7.5, 0.1 mM PLP, 0.2 mM NADH, 0.5 mM MgCl, 
different amounts of the respective substrates and 4 nM BhcC. BhcB 
(200 nM) and BhcD (2 pM) were added as coupling enzymes. Kinetics 
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for glycine were measured with 5 mM glyoxylate; kinetics for glyoxylate 
were measured with 20 mM glycine. To determine the kinetic param- 
eters with (2R, 3S)-B-hydroxyaspartate as substrate, the same assay 
mixture was used and 3 pM GhrA was added as a coupling enzyme to 
convert glyoxylate into glycolate. To determine the kinetic parameters 
with D-threonine as substrate, the same assay mixture was used and 
3 uM alcohol dehydrogenase from Saccharomyces cerevisiae (Sigma- 
Aldrich) was added as coupling enzyme to convert acetaldehyde into 
ethanol. 

The enzyme assay to determine the apparent kinetic parameters of 
BhcD was performed at 30 °C ina total volume of 250 pl. The reaction 
mixture contained 100 mM potassium phosphate buffer pH 7.5, 0.2 mM 
NADH, 0.1mM PLP, different amounts of (2R, 3S)-B-hydroxyaspartate, 
and appropriate amounts of the enzymes BhcB and BhcD. Kinetics for 
iminosuccinate were measured with 15 nM BhcD, different amounts of 
(2R, 3S)-B-hydroxyaspartate and BhcB. Toa given amount of (2R, 35)- 
B-hydroxyaspartate, a tenfold molar excess of BhcB was added to start 
the reaction and completely and almost instantly convert the substrate 
pool into iminosuccinate. The initial reaction velocity of BhcD was 
determined after a mixing period of 3 s and the apparent concentra- 
tion of iminosuccinate at this point in time was calculated on the basis 
of previously published values”. Kinetics for NADH and NADPH were 
measured with 2 mM (2R, 3S)-B-hydroxyaspartate, 214 nM BhcB, 28 nM 
BhcD and different amounts of the respective cofactor. No activity was 
measurable ina reaction mixture containing 100 mM potassium phos- 
phate buffer pH 7.5, 0.2 mM NADH, 0.1mM PLP and 3 mM oxaloacetate 
as well as 9 mM ammonium as putative substrates for BhcD. 

The enzyme assay to generate iminosuccinate from (2R, 3S)-B- 
hydroxyaspartate (catalysed by BhcB) and further chemical reduction 
of iminosuccinate to L-aspartate with the reducing agent NaBH,CN*° 
was performed at 30 °C ina total volume of 1 ml. The reaction mixture 
contained 50 mM Tris pH 7.5, 1mM (2R, 3S)-B-hydroxyaspartate, 0.1mM 
PLP, 1mM MgCl, 214 nM BhcB and 1 mM NaBH.CN. The reaction was 
carried out in D,O. Aliquots of 180 pl were taken after O, 0.5, 1, 2 and 
3 min and the reaction was immediately stopped by quenching with 
formic acid (4% final concentration). The samples were centrifuged at 
17,000g and 4 °C for 15 min and the supernatant diluted 1:4 in double- 
distilled water for liquid chromatography-mass spectrometry (LC-MS) 
analysis. In negative control experiments, NaBH,CN was omitted from 
the reaction mixture. The same experiment was performed with added 
BhcD instead of NaBH,CN to enzymatically reduce iminosuccinate to 
L-aspartate. The reaction mixture contained 50 mM Tris pH 7.5, 1 mM 
(2R, 3S)-B-hydroxyaspartate, 2mM NADH, 0.1mM PLP,1mM MgCl, 
214 nM BhcB and 28 nM BhcD. 

LC-MS measurements were performed using an Agilent 6550 iFun- 
nel Q-TOF LC-MS system equipped with an electrospray ionization 
(ESI) source set to negative ionization mode. LC was carried out as fol- 
lows. The analytes were separated on an aminopropyl column (30 mm 
x 2mm, particle size 3 zm, 100 A; Luna NH2, Phenomenex) using a 
mobile phase system consisting of 95:5 20 mM ammonium acetate 
pH 9.3 (adjusted with ammonium hydroxide to a final concentration 
of approximately 10 mM): acetonitrile (A) and acetonitrile (B). Chro- 
matographic separation was carried out using the following gradient 
condition at a flow rate of 250 pl min: O min, 85% B; 3.5 min, 0% B, 
7 min, 0% B; 7.5 min, 85% B; 8 min, 85% B. Column oven and autosa- 
mpler temperature were maintained at 15 °C. The ESI source was set 
to the following parameters: capillary voltage was set at 3.5 kV and 
nitrogen gas was used as nebulizing (20 psig), drying (131 min™, 225 °C) 
and sheath gas (121 min“, 400 °C). The Q-TOF mass detector was cali- 
brated before measurement using an ESI-L Low Concentration Tuning 
Mix (Agilent) with residuals and corrected residuals less than 2 ppm 
and 1 ppm, respectively. MS data were acquired with a scan range of 
50-600 m/z. Autorecalibration was carried out using 113 m/zas refer- 
ence mass. Subsequent peak integration of all analytes was performed 
using eMZed 2.29.4.0%. 


Enzyme activity assays in P. denitrificans cell extracts 

P. denitrificans cultures were collected during mid-exponential phase 
(OD¢o9 of 0.5-0.7), resuspended in ice-cold 100 mM potassium phos- 
phate buffer (pH 7.2) and lysed by sonication. Cell debris was separated 
by centrifugation at 35,000gand 4 °C for Lh. Total protein concentra- 
tions of the resulting cell-free extracts were determined by Bradford 
assay** using bovine serum albuminas standard. The assays for activity 
of BhcABCD were performed as described above, except that 100 mM 
potassium phosphate buffer pH 7.5 was replaced with 100 mM Tris 
pH 7.5. During BhcD assays, 90 pl samples were taken after 0.5, 1 and 
2 min, and the reaction was immediately stopped by quenching with 
formic acid (4% final concentration). The samples were centrifuged at 
17,000gand 4 °C for 15 min and the supernatant diluted 1:10 in double- 
distilled water for LC-MS analysis. L-Malate and L-aspartate in the sam- 
ples were quantified using a standard curve of each compound ranging 
from 10 uM to 1,000 uM. 


Genetic modification of P. denitrificans 

The upstream and downstream flanking regions of the bhcABCD genes 
from P. denitrificans DSM 413 were cloned into the gene deletion vec- 
tor pREDSIX”. To this end, the flanking regions were amplified from 
genomic DNA of P. denitrificans DSM 413 with the primers given in Sup- 
plementary Table 3. The resulting PCR products were used to perform 
Gibson assembly with the vector pREDSIX, which had been digested 
with Mfel. Subsequently, the resulting vector was digested with Ndel, 
and a kanamycin-resistance cassette, which had been cut out of the 
vector pRGD-Kan with Ndel, was ligated into the cut site to generate 
the final vectors for gene deletion. For gene deletion of each of the 
genes bhcABCD separately and of the complete bhc gene cluster, the 
corresponding plasmid was first transformed into chemically com- 
petent £. coli ST18* cells, which were then grown on LB agar plates 
containing 100 pg ml ampicillin, 50 pg mI kanamycin and 50 pg mI 
aminolevulinic acid at 37 °C overnight. A culture in selective LB medium 
was inoculated the next day and left to grow overnight at 37 °C. The 
cultures were diluted the next morning to an ODgoy of 0.1. A culture of 
wild-type P. denitrificans DSM 413 in LB medium was inoculated from 
aglycerol stock and grownat 30 °C. ST18 cultures were collected at an 
OD, of around 0.7, and the P. denitrificans culture was collected at an 
OD, of about 1.3. All cell pellets were washed once with sterile 10 mM 
MgSO, and resuspended to an OD,¢o. of approximately 10 in sterile 
10 mM MgSO,,. Suspensions of ST18 cells and P. denitrificans cells were 
mixed ina 2:1 ratio and spotted on minimal medium agar plates with- 
out any carbon source. Plates were incubated at 30 °C overnight. The 
next morning, spots were removed from the plates, resuspended in LB 
medium and plated on LB agar plates containing 25 pg mI kanamycin. 
Plates were incubated at 30 °C for 3 days. The respective gene deletion 
was verified by colony PCR and DNA sequencing (Eurofins Genomics) 
and the deletion strain was propagated in selective LB medium. 


High-throughput growth assays with P. denitrificans strains 
Cultures of wild-type P. denitrificans DSM 413 and gene deletion strains 
were pre-grown at 30 °C inLB medium containing 25 pg ml kanamy- 
cin, when necessary. Cells were collected, washed once with minimal 
medium containing no carbon source and used to inoculate growth 
cultures of 180 pl minimal medium containing an appropriate carbon 
source as well as 25 pg ml kanamycin for gene deletion strains. Growth 
in 96-well plates (Thermo Fisher Scientific) was monitored at 30 °C at 
600 nmina Tecan Infinite M200Pro reader (Tecan). The resulting data 
were evaluated using GraphPad Prism 8.0.0. 


Whole-cell shotgun proteomics 

To acquire the proteome of P. denitrificans growing on different carbon 
sources, 30 ml cultures were grown to mid-exponential phase (OD,o, of 
around 0.4) in minimal medium supplemented with 30 mM succinate 


or 60 mM glycolate. Four replicate cultures were grown for each car- 
bon source. Main cultures were inoculated from precultures grown 
in the same medium at a dilution of 1:1,000. Cultures were collected 
by centrifugation at 4,000g and 4 °C for 15 min. The supernatant was 
discarded and pellets were washed in 40 ml phosphate-buffered saline 
(PBS; 137 mM NaCl, 2.7 mM KCI, 10 mM Na,HPO,, 1.8 mM KH,PO,, pH 7.4). 
After washing, cell pellets were resuspended in 1 ml PBS, transferred 
into Eppendorf tubes and centrifuged as described above. Cell pellets 
in Eppendorf tubes were snap-frozen in liquid nitrogen and stored at 
-80 °C until they were used for the preparation of samples for LC-MS 
analysis and label-free quantification. 

For protein extraction, bacterial cell pellets were resuspended in 
4% SDS and lysed by heating (95 °C, 15 min) and sonication (Hielscher 
Ultrasonics). Reduction was performed for 15 min at 90 °C inthe pres- 
ence of 5 mM tris(2-carboxyethyl) phosphine followed by alkylation 
using 10 mM iodoacetamide at 25 °C for 30 min. The protein concen- 
trationin each sample was determined using the BCA protein assay kit 
(Thermo Fisher Scientific) following the manufacturer’s instructions. 
Protein clean-up and tryptic digestion were performed using the SP3 
protocolas previously described™ with minor modifications regarding 
protein digestion temperature and solid-phase extraction of peptides. 
SP3 beads were obtained from GE Healthcare. Trypsin (11g, Promega) 
was used to digest 50 pg of total solubilized protein from each sample. 
Tryptic digestion was performed overnight at 30 °C. Subsequently, all 
protein digestions were desalted using C18 microspin columns (Har- 
vard Apparatus) according to the manufacturer’s instructions. 

LC-MS/MS analysis of protein digestions was performed ona 
Q-Exactive Plus mass spectrometer connected to an electrospray ion 
source (Thermo Fisher Scientific). Peptide separation was carried out 
using an Ultimate 3000 nanoLC-system (Thermo Fisher Scientific), 
equipped with an in-house-packed C18 resin column (Magic C18 AQ 
2.4 um; Dr. Maisch). The peptides were first loaded onto a C18 precol- 
umn (preconcentration set-up) and then eluted in backflush mode 
with a gradient from 98% solvent A (0.15% formic acid) and 2% solvent 
B (99.85% acetonitrile and 0.15% formic acid) to 25% solvent B over 
105 min, continued from 25% to 35% of solvent B up to 135 min. The flow 
rate was set to 300 nl min”. The data acquisition mode for the initial 
label-free quantification study was set to obtain one high-resolution 
MSscanataresolution of 60,000 (m/z200) witha scanning range from 
375 to 1,500 m/z followed by MS/MS scans of the 10 most intense ions. 
Toincrease the efficiency of MS/MS shots, the charged-state screening 
modus was adjusted to exclude unassigned and singly charged ions. The 
dynamic exclusion duration was set to 30s. Theion accumulation time 
was set to 50 ms (both MS and MS/MS). The automatic gain control was 
set to3 x 10° for MS survey scans and1 10° for MS/MS scans. Label-free 
quantification was performed using Progenesis QI (v.2.0). MS raw files 
were imported into Progenesis and the output data (MS/MS spectra) 
were exported in MGF format. MS/MS spectra were then searched using 
MASCOT (v.2.5) against a database of the predicted proteome from P. 
denitrificans downloaded from the UniProt database (https://www. 
uniprot.org/; download date 26 January 2017), containing 386 com- 
mon contaminant and background proteins that were manually added. 
The following search parameters were used: full tryptic specificity 
required (cleavage after lysine or arginine residues); two missed cleav- 
ages allowed; carbamidomethylation (C) set as a fixed modification; 
and oxidation (M) set as a variable modification. The mass tolerance 
was set to 10 ppm for precursor ions and 0.02 Da for fragmentions for 
high-energy collision dissociation. Results from the database search 
were imported back into Progenesis, mapping peptide identifications 
to MSI features. The peak heights of all MS1 features annotated with 
the same peptide sequence were summed, and protein abundance was 
calculated per LC-MS run. Next, the data obtained from Progenesis 
were evaluated using the SafeQuant R package v.2.2.2. Then, the 1% 
false-discovery rate of identification and quantification as well as the 
intensity-based absolute quantification values were calculated. 


Electrophoretic mobility shift assays 

Fluorescently labelled DNA fragments for electrophoretic mobility shift 
assays were generated by PCR from genomic DNA of P. denitrificans DSM 
413. For the Pbhc regulatory region, primers Pbhc_fw and Pbhc _rev- 
dye were used to generate a 238-bp fragment containing the putative 
Pbhc promoter. The primers bhcA_fw and bhcA_rev-dye were used to 
generate a 255-bp fragment containing a fragment of the bhcA gene 
as negative control. The primers Pbhc_rev-dye and bhcA_rev-dye were 
5’-labelled with the Dyomics 781 fluorescent dye (Microsynth). Binding 
reactions between the DNA fragments (0.025 pmol), various amounts 
of the purified protein BhcR (400, 2,000x, 4,000x, 10,000x, 20,000x 
and 40,000x molar excess), and various concentrations of glyoxylate 
(0.01, 0.05, 0.1, 0.2, 0.5 and 1mM final concentration) were performed 
in buffer A (20 mM potassium phosphate pH 7.0, 1 mM DTT, 5mM MgCl, 
50 mM KCI, 15 pg mI bovine serum albumin, 50 pg mI herring sperm 
DNA, 5% v/v glycerol, 0.1% Tween-20) ina total volume of 20 pl. After the 
reaction mixtures were incubated at 37 °C for 20 min, the samples were 
loaded onto a native 5% polyacrylamide gel and electrophoretically 
separated at 110 V for 60 min. BhcR-DNA interactions were detected 
using an Odyssey FC Imaging System (LI-COR Biosciences). 


Crystallization and structure determination of BhcC and BhcD 
The sitting-drop vapour-diffusion method was used for crystallization 
at 16 °C. Purified BhcC (10 mg mI) was mixed ina 1:1 ratio with solu- 
tion A containing 20% PEG 3350, 0.2 Mammonium chloride, pH 6.3 
(final drop volume 1.4 pl). Reservoirs were filled with 40 pl solution 
A. Crystals appeared within 14 days. Crystals were briefly soaked in 
mother liquor supplemented with 40% glycerol for cryoprotection 
before freezing in liquid nitrogen. 

Purified BhcD (10 mg ml) was mixed in a 1:1 ratio with solution 
B containing 20% PEG 3350, 0.2 M Mg(NO,),,5 mM NAD* and 5 mM 
Tb-X04, pH 6.4 (final drop volume 4 pl). Various additives were tested to 
improve crystal quality and size. The best results were achieved with the 
recently described nucleating and phasing agent Tb-Xo4”’, Reservoirs 
were filled with 114 pl of solution B. Crystals appeared within a week. 
Crystals were briefly soaked in mother liquor supplemented with 40% 
ethylene glycol for cryoprotection before freezing in liquid nitrogen. 

X-ray diffraction data were collected at the beamlines ID29 and ID30B 
of the ESRF (Grenoble, France) and at beamline P13 of DESY (Hamburg, 
Germany). The data were processed with the XDS™ (build 20180126) and 
CCP4 v.7.0 software packages». The structures were solved by molecu- 
lar replacement. For BhcC, the structure of aD-threonine aldolase (PDB 
4V15)°° served as search model. For BhcD, ahomology model was made 
based on the structure of L-alanine dehydrogenase (PDB 1OMO)” using 
Swiss-Model°®. This homology model was then used as search model for 
the molecular replacement. The molecular replacement was carried 
out using Phaser of the Phenix software package” (v.1.14), built with 
Phenix.Autobuild and refined with Phenix.Refine. Additional model- 
ling, manual refining and ligand fitting was done in Coot® (v.0.8.9). 
Final positional and B-factor refinements, as well as water picking, 
were performed using Phenix.Refine. The structure models for BhcC 
and BhcD were deposited at the Protein Data Bank in Europe (PDBe) 
under PDB accession numbers 6QKB and 6RQA, respectively. Figures 
were made using Pymol 1.8. 


Analysis of North Sea metagenome data 

Searches for the bhc gene cluster in 38 assembled surface seawater 
metagenomes sampled at the island of Helgoland between 2010 and 
2012 were performed using the Ruegeria pomeroyi DSS-3 bhc gene 
cluster proteins as reference (NCBI protein IDs WP_011241924.1(BhcR), 
WP_011241925.1 (BhcA), WP_011241926.1 (BhcB), WP_011241927.1 
(BhcC), WP_044029519.1 (BhcD)). All identified proteins of the 38 
metagenomes were searched against these proteins using DIAMOND” 
BLASTp and post-filtered to those hits for which the entire gene cluster 
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could be detected ona metagenome contig. These contigs were, if pos- 
sible, linked to metagenome-assembled genomes (MAGs) binned from 
thesame 38 metagenomes. MAGs were binnedas previously described™ 
and both the metagenome assemblies and MAGs are accessible under 
accession PRJEB28156 at the European Nucleotide Archive (ENA). MAG 
quality was assessed using CheckM v.1.0.7°. Abundance estimates of 
MAGs and the single unbinned contig were calculated based on read 
mapping as reads per kilobase per million reads (RPKM; 2 RPKM = 1% 
relative abundance detected by fluorescence in situ hybridization”®). 
Read mapping of all 38 metagenomes to MAGs and the single unbinned 
contig was performed as previously described” using BBMap v.35.14 
(http://bbtools.jgi.doe.gov). 


Phylogenetic analyses 

A genome tree of bacterial strains and five MAGs with the bhc gene 
cluster was calculated using GTDBtk v.0.1.3 with GTDB v.86. GTDBtk 
uses an alignment of 120 bacterial marker genes to infer taxonomic 
relationships. The GTDBtk calculated tree was subsampled to the 264 
bhc gene cluster containing bacterial strains and the MAGs and visual- 
ized using iTOL®. 

Sequences of BhcABCD from 264 bacterial isolates and 6 metage- 
nome contigs (five of which were linked to MAGs) were aligned using 
MUSCLE®, manually curated to remove gaps and concatenated. A phy- 
logenetic tree of concatenated sequences of BhcABCD was calculated 
using raxmlGUI® 1.5b2 using the PROTGAMMA model with Le-Gascuel 
substitution matrix®’, 100 bootstraps and 100 maximum-likelihood 
resamplings. The resulting tree was visualized using iTOL. 

Sequences from the ornithine cyclodeaminase/p-crystalline 
superfamily (Conserved Domain accession cl27428) and the type 
III PLP-dependent enzymes superfamily (Conserved Domain acces- 
sion cl00261) were downloaded from the NCBI protein database and 
aligned using MUSCLE. Phylogenetic trees of the aligned sequences 
were calculated with raxmlGUI 1.5b2 using the PROTGAMMA model 
with Le-Gascuel substitution matrix, 100 bootstraps and 100 maximum- 
likelihood resamplings. The resulting trees were visualized using iTOL. 

In total, 1,614 protein sequences from the ornithine cyclodeaminase/ 
p-crystalline superfamily were used for generation of a sequence simi- 
larity network (SSN) using the EFI-EST web tool® with a cut-off value of 
1x10. In this SSN, all connected sequences that shared 80% or more 
identity were grouped into a single node, resulting in 619 meta nodes. 
The SSN was visualized with Cytoscape 3.7.1 (https://cytoscape.org) 
and edges between nodes with less than 50% identity were removed. 


Analysis of Tara Oceans metagenomes 

BhcC from P. denitrificans DSM 413 (Uniprot A1B8Z]1) and Gcl from 
Starkeya novella DSM 506 (Uniprot D7A6RI1) were used as queries to 
search the OM-RGC v1 database using the Ocean Gene Atlas” web tool 
(http://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas) with a cut-off 
value of 1x10. The resulting hits were inspected and sequences that 
were deemed to not belong to BhcC or Gcl were removed. The follow- 
ing criteria were used: at least 50% of the query sequence covered; 
at least two of the three residues A160, A195, S313 present for BhcC 
sequences; residues V25, V51, L421, L476, L478, 1479” present for Gcl 
sequences. The coordinates of sampling sites with positive hits for 
BhcC in samples from surface water (0.22-3-um size fraction) were 
downloaded and visualized using Ocean Data View 5.1.5 (Schlitzer, R., 
Ocean Data View, odv.awi.de, 2018). Taxonomic assignments of BhcC 
and Gcl sequences were downloaded and manually converted to GIDB 
taxonomy. Sequence IDs are listed in Supplementary Data 2. 


Environmental sample collection and processing 

Sampling was carried out on each working day (Monday-Friday) with 
the RV Aade (https://www.awi.de/en/expedition/ships/more-ships. 
html) at the research site ‘Kabeltonne’ (54° 11.3’ N, 7° 54.0’ E) from 
approximately 1 m water depth in 20 | carboys. The water samples for 


microbial biomass were subjected to fractionating filtration directly 
upon arrival in the Biologische Anstalt Helgoland laboratory (typically 
less than one hour after sampling). Three membrane 142-mm diameter 
filtration units were operated in parallel to keep filtration times toa 
minimum. First, samples were pre-filtered through 142-mm diameter 
10-"m-pore-size polycarbonate filters (Merck Chemicals) by means of 
an air-pressure pump to remove large particles and eukaryotic plank- 
ton. Then, the water samples were filtered with air-pressure pumps 
onto 142-mm diameter 3-~m-pore-size polycarbonate filters (Merck 
Chemicals) to collect predominantly bacteria associated with smaller 
particles and algae. Afterwards, dedicated aliquots were filtered onto 
142-mm diameter 0.2-m pore-size-polyethersulfone filters (Merck 
Chemicals) for DNA and RNA extraction. Bacterioplankton dominated 
this 0.2-um fraction. The entire filtration process for all fractions was 
usually finished within 3h, thatis, latest 4 hafter the sampling. All filters 
were stored at —80 °C until further analyses. 


Total cell counts 

Samples were fixed with 1% formaldehyde and filtered onto polycar- 
bonate membrane filters as described above. Total cell counts were 
determined from 10 ml fixed seawater samples. One filter section was 
cut and stained with 4’,6-diamidino-2-phenylindole (DAPI, 1 pg ml“). 
The stained filters were analysed manually; the total cell count includes 
heterotrophic bacteria as well as autofluorescent cyanobacteria, but 
not picoeukaryotic cells. 


Concentration of Chla 

The concentration of Chl awas determined in subsurface water on each 
working day (Monday-Friday) as part of the Helgoland Roads LTER 
time series (https://www.awi.de/en/science/biosciences/shelf-sea- 
system-ecology/working-groups/long-term-observations-Ito.html). 
The concentration of Chl a was assessed from fluorescence data using 
an algal group analyser (bbe moldaenke). 


Determination of glycolate concentrations 

Once per week, 5 aliquots of 2 ml each were taken from the filtrate after 
0.2-um filtration and stored at —80 °C until analysis. Glycolate con- 
centrations were measured after derivatization of the samples with 
3-nitrophenylhydrazine as previously described”. LC-MS analyses were 
performed on an Agilent 6495B Triple Quad LC-MS system equipped 
with an electrospray ionization source. The analytes were separated on 
aRP-18 column (SO mm x 2.1 mm, particle size 1.8 tm, ZORBAX RRHD 
Eclipse Plus C18; Agilent) kept at 40 °C using a mobile phase system 
that consisted of 0.1% formic acid in water (A) and acetonitrile (B). 
The gradient was as follows: O min, 5% B; 1 min, 5% B; 6 min, 95% B; 
6.5 min, 95% B; 7 min, 5% B at a flow rate of 250 pl min”. Samples were 
held at 15 °C and injection volume was 5 pl. MS/MS data were acquired in 
negative MRM mode. Capillary voltage was set at 3 kV and nitrogen gas 
was used as nebulizing (25 psig), drying (111 min™, 130 °C) and sheath 
gas (121 min™, 400 °C). The dwell time and fragmentor voltage were 
20 ms and 380 V, respectively. Optimized collision energy used for the 
derivatized glycolate (210 m/z > 137 m/z) was 22 V. LC-MS data were 
analysed and quantified using MassHunter Qualitative Navigator and 
QQQ Quantitative Analysis software (Agilent). 


Determination of glycolate uptake rates 

Samples for glycolate uptake measurements were collected on 10 April, 
15 May and 29 May 2018. All samples were used after filtration through 
a3-um filter and divided into 4 live 40 ml subsamples in sterile plastic 
tubes wrapped in aluminium foil and incubated with 165 nM calcium 
[1-“C]glycolate (American Radiolabelled Chemicals; 55 mCi mmolI*, 
0.1mCi mI in sterile water) at 12 °C for 8h. Controls consisted of four 
40 mlsubsamples killed in 10% formalin for 1h before addition of 165 nM 
calcium [1-“C]glycolate. Glycolate uptake was monitored over time by 
withdrawing 5 ml aliquots from each subsample, filtering each aliquot 


onto a 0.2-um pore size Nuclepore polycarbonate filter (GE Health- 
care), rinsing the filter 3 times with5 ml of filter-sterilized sea water and 
measuring the radioactivity with a Tri-Carb 4910 TR liquid scintillation 
analyser (PerkinElmer) using the Ultima Gold scintillation cocktail 
(PerkinElmer). Glycolate uptake rates were determined by linear fit 
of the counts per minute measured on the filters over time. Uptake 
rates were corrected to account for the presence of non-radioactive 
glycolate in the samples. 


DNA and RNA extraction, cDNA synthesis and qPCR 

DNA and RNA was extracted from filters using the AllPrep Bacterial 
DNA/RNA/Protein Kit (Qiagen) according to the manufacturer’s instruc- 
tions. The RNA samples were treated with the TURBO DNA-free Kit 
(Thermo Fisher Scientific) according to the manufacturer’s instructions 
to exclude contamination with DNA. DNA and RNA concentrations 
were determined using the Qubit dsDNA/RNA HS Assay Kit (Thermo 
Fisher Scientific) according to the manufacturer’s instructions. In total, 
2 ug of RNA was used for cDNA synthesis with the GoScript Reverse 
Transcription System (Promega) and random hexamers according to 
the manufacturer’s instructions. 

Degenerate primers for the bhcC gene were designed using the 
j-CODEHOP software” and an alignment of 207 bhcC sequences from 
bacterial strains isolated from marine habitats. Sequences were aligned 
using MUSCLE. Extracted DNA and cDNA of RNA were quantified using 
aCFX Connect Real-Time System (Bio-Rad). SYBR Green JumpStart Taq 
ReadyMix (Sigma-Aldrich) was used for the PCR amplification mixture 
according to the manufacturer’s instructions. Final MgCl, concentra- 
tion was 3 mM, and the amplification protocol consisted of an initial 
enzyme activation step at 95 °C for 5 min, followed by 45 cycles of 95 °C 
for 30s, 60 °C for 30s, and 72 °C for 45s. Eight standard amounts rang- 
ing from 3 x 10' to 3 x 108 copies were run in triplicate for each set of 
analyses. Regression of all standard curves yielded an value of at least 
0.998. Allsamples were run in triplicate. The starting copy numbers of 
bhcCin DNA and cDNA were calculated based on regression parameters 
of standard curves, and gene/transcript copy numbers per cell were 
calculated based onthe volume of sea water filtered, the microbial cell 
count at the time of sampling, the amount of extracted DNA or RNA, 
and the volume of DNA or cDNA used per reaction. The degenerate 
primers were validated with genomic DNA of P. denitrificans DSM 413, 
Rhodobacter sphaeroides 2.4.1, and E. coli K-12 MG1655 as template 
using the same qPCR protocol as above. Standards for quantification 
were created by PCR using genomic DNA of P. denitrificans DSM 413 as 
template. Purified bhcC PCR product was quantified using the Qubit 
dsDNA HS Assay Kit (Thermo Fisher Scientific) according to the manu- 
facturer’s instructions. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The coordinates and structure factors of the crystal structures gen- 
erated from this research are available at the PDB under accession 
numbers 6QKB and 6RQA. Mass spectrometry proteomics data are 
available via ProteomeXchange with the identifier PXDO13274. MAGS 
are available under accession PRJEB28156 at the European Nucleotide 
Archive (ENA). All other relevant data are available in the Article andthe 
Supplementary Information. Source Data for Figs. 2, 3 and Extended 
Data Fig. 1, 4,5, 7-9 are provided with the paper. 
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Extended Data Fig. 1| Previously reported glycolate concentrationsin 
environmental samples and cultures of photosynthetic organisms. a, Bar 
diagram of glycolate concentrations as previously reported in environmental 
samples. For details onsamples, replicates, and analytics see b and the 


literature cited therein. b, Table of glycolate concentrations as previously 
reported in environmental samples (E1, E2 and so on) and cultures of 
photosynthetic organisms (C1, C2 and soon). When reported inthe 
reference!?*+?3039 the mean value + error is given. 


Sample ID 


E1 
E2 
E3 
E4 
E5 
E6 
E7 
E8 
E9 


E10 
E11 
E12 
E13 


0 200 400 600 800 1000 1200 
Glycolate (nM) 


Sample ID Sample Maximum glycolate Analytical Reference 
concentration [nM] _ method 
E1 Coastal seawater (Ipswich Bay, MA, USA) 1,026 Colorimetry °° 
E2 Atlantic Ocean (51°40’ S, 57°48’ W) 342 +53 Colorimetry 1 
E3 Antarctic lake (60°42’ S, 45°37’ W) 526 + 154 
E4 Coastal seawater (lpswich Bay, MA, USA) 552 Colorimetry 
E5 Coastal seawater (Menai Straits, Anglesey, UK) 789 Colorimetry ° 
E6 Coastal seawater (Ipswich Bay, MA, USA) 723 Colorimetry ' 
E7 Coastal seawater (New York Bight, NY, USA) 475 + 459 Colorimetry ° 
E8 Mediterranean Sea (42°28' N, 30°16‘ E) 1,183 + 66 HPLC 10 
E9 Atlantic Ocean (oligotrophic waters, 21°01754”N, 224439 HPLC mM 
31°09’62” W) 
E10 Atlantic Ocean (mesotrophic waters, 18°27’22”N, 973+53 
21°10'18” W) 
E11 Atlantic Ocean (eutrophic waters, 20°31’49” N, 736 +79 
18°34’39” W) 
E12 Mediterranean Sea (43°25’ N, 7°52’ E) 79453 GC . 
E13 Coastal seawater (Dabob Bay, WA, USA) 100 + 20 HPLC 4 
C1 Culture of Chlorella 39,450 — 105,190 4C-tracing ° 
C2 Culture of Euglena gracilis 591,720 Colorimetry °° 
c3 Culture of Chaetoceros socialis 240,630 Colorimetry °% 
C4 Culture of Dunaliella tertiolecta 19,500 + 1125 HPLC 37 
C5 Culture of Thalassiosira weissflogii 799 HPLC 38 
C6 Culture of Prochlorococcus MED4 1873 + 375 HPLC 2 
(phosphorus-limited) 
C7 Culture of Prochlorococcus MED4 2831 + 458 
(phosphorus-replete) 
c8 Culture of Prochlorococcus MIT9312 749 +916 
(phosphorus-limited) 
cg Culture of Prochlorococcus MIT9312 333 + 125 


(phosphorus-replete) 


Extended Data Fig. 1| Previously reported glycolate concentrationsin 
environmental samples and cultures of photosynthetic organisms. a, Bar 
diagram of glycolate concentrations as previously reported in environmental 
samples. For details onsamples, replicates, and analytics see b and the 


literature cited therein. b, Table of glycolate concentrations as previously 
reported in environmental samples (E1, E2 and so on) and cultures of 
photosynthetic organisms (C1, C2 and soon). When reported inthe 
reference!?*+?3039 the mean value + error is given. 


Article 


Tree scale: 0.1 


iw — 
a @ 
= | 


— 
wo 
<i 


Extended Data Fig. 2 | Crystal structure and phylogenetic analysis of the 
B-hydroxyaspartate aldolase BhcC. a, Cartoon representation of the 
B-hydroxyaspartate aldolase homodimer (PDB 6QKB) with superimposed 
protein surface (left, side view; right, top view). b, Active site of 
B-hydroxyaspartate aldolase with covalently bound PLP (light cyan). Active site 
residues highlighted in pink (A160, A195 and S313) are completely conserved 
only among B-hydroxyaspartate aldolases, but differ in D-threonine aldolases. 
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c, Active site of D-threonine aldolase (PDB 4V15). The corresponding conserved 
residues among D-threonine aldolases (Q155, S190 and C303) are highlighted as 
inb.d, Maximum likelihood phylogenetic tree of the type III PLP-dependent 
protein superfamily. Sequences of the B-hydroxyaspartate aldolase BhcC and 
its homologues forma distinct clade (blue) within the D-threonine aldolase 
branch of this superfamily. Bootstrap values of at least 50 are given onthe 
respective nodes. 
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Extended Data Fig. 3| See next page for caption. 
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Extended Data Fig. 3 | Crystal structure and phylogenetic analysis of the 
iminosuccinate reductase BhcD. a, Cartoon representation of the 
iminosuccinate reductase homodimer (PDB 6RQA) with superimposed protein 
surface (left, side view; right, top view). b, Active site of BhcD with bound NAD* 
(light cyan). Residues highlighted in pink (V39, R41, G52, K54 and H83) may 
contribute to substrate binding and are conserved among iminosuccinate 
reductases, but differ in L-alanine dehydrogenases. c, Active site of L-alanine 
dehydrogenase (PDB 10MO). The corresponding conserved residues among 
L-alanine dehydrogenases (K41, Y43, R52, M54 and V81) are highlighted asinb. 
d, Maximum likelihood phylogenetic tree of the ornithine 


cyclodeaminase/p-crystalline protein superfamily. Sequences of the 
iminosuccinate reductase BhcD and its homologues forma distinct clade (red) 
within this superfamily. Bootstrap values of at least 50 are given onthe 
respective nodes. e, Sequence similarity network of 1,614 sequences from the 
ornithine cyclodeaminase/pl-crystalline protein superfamily. Connected 
sequences with more than 80% identity are clustered into nodes. The number 
in each node gives the number of sequences contained within. Nodes with 
more than 50% identity are connected by edges. Similar to the phylogenetic 
analysis shown ind, sequences of the iminosuccinate reductase BhcD andits 
homologues forma distinct clade (red) within this superfamily. 
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Extended Data Fig. 4 | Michaelis-Menten kinetics of all enzyme reactions 
characterized in this study. a, Michaelis-Menten kinetics for aspartate- 
glyoxylate aminotransferase (BhcA). b, Michaelis-Menten kinetics for 
B-hydroxyaspartate dehydratase (BhcB).c, Michaelis-Menten kinetics for 


B-hydroxyaspartate aldolase (BhcC). d, Michaelis-Menten kinetics for 
iminosuccinate reductase (BhcD). a-d, Data are shown fromn=3 independent 
experiments at different substrate concentrations. The data are summarized 


in Table1. 
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Extended Data Fig. 5| See next page for caption. 
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Extended Data Fig. 5 | Physiological role of the BHAC in P. denitrificans DSM 
413. a, Growth rate of wild-type P. denitrificans DSM 413 on the BHAC 
substrates glycolate and glyoxylate. The middle line and box are the median 
and interquartile range of n= 6 independent experiments and the whiskers 
indicate the maximum range of the dataset. b,c, Representative growth curves 
of wild-type P. denitrificans DSM 413 (grey) and bhc deletion strains (coloured) 
grown inthe presence of 60 mM glycolate (b) or 60 mM glyoxylate (c). Deletion 
of any single gene in the bhc gene cluster is sufficient to completely abolish 
growth inthe presence of glycolate and glyoxylate. These experiments were 
repeated three times independently with similar results. d-f, Growth rates (1) 
of wild-type P. denitrificans DSM 413 (grey) and BHAC deletion strains 
(coloured) grown in the presence of 60 mM acetate (d), 30 mM succinate (e) or 
20 mM glucose (f). Deletion of any single gene in the bhc gene cluster, or of the 
whole bhc gene cluster, still permits growth on acetate, succinate or glucose 
with comparable growth rates as for the wild type. Data are the mean +s.d. of 
n=3 independently grown cultures. g, Analysis of the proteome of glycolate- 
grown compared to succinate-grown P. denitrificans DSM 413. All proteins that 
were quantified by at least three unique peptides are shown. The 15 proteins 
that showed the strongest increase in abundance are marked in the volcano 
plot. The four enzymes of the BHAC are marked in red, the three subunits of 
glycolate oxidase in orange, the proteins of a putative operon for lactate 
utilization in white and the proteins directly downstream of the bhc gene 
cluster in light red. h, The abundance of these proteins, given as the percentage 
of the intensity-based absolute quantification (iBAQ) value. Data are the 


mean ts.d. ofn=4 independently grown cultures. i, Specific activities of BHAC 
enzymes in cell-free extracts of glycolate-grown P. denitrificans DSM 413, as 
measured spectrophotometrically. Note that the activity of BhcD is plotted on 
the right yaxis and consists of the actual iminosuccinate reductase activity 
(iminosuccinate to L-aspartate) as well as endogenous malate dehydrogenase 
activity (oxaloacetate to L-malate).j, Ratio of malate to aspartate determined 
by LC-MS during the enzyme assay for BhcD activity. The ratio remains 
approximately constant at 12:1, indicating that only approximately 8% of the 
activity (around 1.3 U mg”) shown inican be ascribed to iminosuccinate 
reductase. i,j, Data are the mean+s.d. of n=3 independently grown cultures; 
each data point represents the mean of n=3 technical replicates. k, DNA- 
binding properties of BhcR. Left, a fluorescently labelled DNA fragment 
carrying the putative promoter region of the bhc gene cluster (P,,.) Was 
incubated with increasing amounts of purified BhcR protein and subsequently 
separated by electrophoresis to visualize DNA bound to BhcRand free DNA; a 
DNA fragment derived from the coding region of bhcA was used as anegative 
control. BhcR specifically forms a complex with the DNA fragment containing 
the putative promoter region of the bhc gene cluster. Right, the P,,,.-BhcR 
complex was incubated with increasing concentrations of glyoxylate and 
subsequently separated by electrophoresis to assess the effect of glyoxylate on 
complex formation; the bhcA DNA fragment together with BhcR was used asa 
negative control. Increasing concentrations of glyoxylate decrease the binding 
of BhcR to the P,,,- DNA fragment. For gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 6 | See next page for caption. 


Extended Data Fig. 6 | Phylogenetic analysis of the bhc gene cluster. 

a, Genome-based maximum likelihood phylogenetic tree of bacterial strains 
with the bhc gene cluster. The bhc gene cluster is found in 
Gammaproteobacteria (green), and in the alphaproteobacterial orders 
Rhizobiales (blue) and Rhodobacterales (red), as well as in one member each of 
Sphingomonadales and Kiloniellales. The phylogenetic tree is based onan 
alignment of 120 bacterial marker genes from 264 publicly available bacterial 
genomes and 5 MAGs and was calculated using GTDB-Tk® (https://github.com/ 
Ecogenomics/GtdbTk). If several strains from the same genus cluster together, 
nodes are collapsed at the genus level, and the size of the resulting circle 
corresponds to the respective number of strains. Loktanella*: collapsed node 
contains the MAGs 20110516 _Bin_8_ land 20110523 Bin_9_1;Planktotalea**: 
collapsed node contains the MAG 20110523_Bin_97_1; Litoricola***: collapsed 
node contains the MAG 20110526 Bin_19_1.b, Maximum likelihood 
phylogenetic tree of concatenated BHAC enzyme sequences. Colour codeis 
the sameas ina. Phylogenetic groups that were mostly isolated from terrestrial 
or freshwater habitats are marked with a black dot. Comparison witha reveals 
that the sequences of the BHAC enzymes are not phylogenetically 
representative, as, for example, alpha- and gammaproteobacterial sequences 
formacommon branch and sequences from terrestrial or freshwater 
Rhizobiales and Rhodobacterales form another common branch. This 


suggests that the bhc gene cluster might have been subject to horizontal gene 
transfer between distantly related strains in shared habitats. The 
environmental bhc gene cluster sequence that could not be binned 
successfully is marked in bold and clusters together with isolated 
representatives of Pseudoruegeria, Litoreibacter and Pseudooceanicola. The 
phylogenetic treeis based onthe concatenated alignments of the 4 enzymes 
(BhcA-BhcD) from 264 publicly available bacterial genomes and from6 
metagenome contigs. It was calculated using raxmIGUI®. Bootstrap values of 
at least 50 are given on the respective nodes; calculated branch lengths of the 
tree are ignored for the sake of better visualization. If several strains fromthe 
same genus cluster together, nodes are collapsed at the genus level, and the 
size of the resulting circle corresponds to the respective number of strains. If 
strains from the same genus cluster in more than one node, the respective 
branches are labelled as Genus_1, Genus_2,and soon, inaclockwise manner. 
Loktanella_2*: collapsed node contains the MAGs 20110516_Bin_8_land 
20110523 _Bin_9_1; Planktotalea**: collapsed node contains the MAG 20110523_ 
Bin_97_1; Litoricola***: collapsed node contains the MAG 20110526 _Bin_19_1. 
a, b, Taxonomy is based on GTDB (release 03-RS86; http://gtdb.ecogenomic. 
org/). Allstrains contained in the phylogenetic trees are listedin 
Supplementary Data1. 
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Extended Data Fig. 7| See next page for caption. 


Extended Data Fig. 7 | Glyoxylate assimilation pathways in marine 
metagenomes. a, Metagenomes collected during the Jara Oceans expedition 


were searched for the presence of BhcC as representative enzyme of the BHAC. 


Dots onthe map mark sampling locations of metagenomes containing BhcC 
sequences; the colour of the dot corresponds to BhcC abundance in samples 
from surface water (0.22-3-um size fraction), as shown in the legend. The map 
was made with Ocean Data View 5.1.5 (Schlitzer, R., Ocean Data View, odv.awi. 
de, 2018). b, Phylogenetic distribution of 104 unique BhcC sequences found in 
Tara Oceans metagenomes.c, Phylogenetic distribution of 32 unique Gcl (as 
representative enzyme of the glycerate pathway) sequences found in Jara 
Oceans metagenomes. Whereas BhcC is mainly found in Alphaproteobacteria, 
Gclis largely restricted to Gammaproteobacteria. b,c, Taxonomy is based on 


GTDB“ (release 03-RS86; http://gtdb.ecogenomic.org/).d, Ratio of the 
abundances (in percentage of total reads) of BhcC to Gclin Tara Oceans 
metagenomes. BhcC:Gcl ratios from n=210 samples are plotted together (left) 
and clustered by sampling depth (SRF, upper layer zone (n=101); DCM, deep 
chlorophyll maximum layer (n= 68); MES, mesopelagic zone (n=41)). Samples 
from the 0.22-3-um size fraction are denoted bya black dot; samples from the 
<0.22-um size fraction are denoted by a blue dot. The median is shownin 
orange as centre value and error bars represent interquartile ranges. The 
median BhcC:Gecl ratio of all samples is 18.7. The highest BhcC:Gcl ratio is found 
in surface water samples (median = 41.8), with the ratio generally being higher 
inthe 0.22-3-um size fraction than in the <O.22-um size fraction. Sequence IDs, 
abundances and BhcC:Gcl ratios are given in Supplementary Data 2. 
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Extended Data Fig. 8 | Abundance of the bhc gene cluster in Helgoland indicate the sampling site. c-e, Abundance of the bhc gene cluster (in RPKM) 
metagenomes. a, The location of Helgoland Island approximately 40 km was calculated in38 metagenomes from samples collected during the algal 
offshore the northern German coastline in the North Seais marked withared spring blooms of 2010 to 2012 in the North Sea close to Helgoland*’. Six 

dot. The map was made with Ocean Data View5.1.5 (R. Schlitzer, Ocean Data different sequences were investigated, five of which could be assigned to 
View, odv.awi.de, 2018). b, The long-term ecological research site ‘Kabeltonne’ metagenome bins (Extended Data Fig. 6 and Supplementary Data 3), whereas 
(red dot: 54° 11.3’ N, 7° 54.0’ E) is located between Helgoland Island (left) and the remaining, most abundant sequence (black) could not be binned 

the smallisland Diine (right). Satellite image from WorldWind Explorer successfully. 


(B. Schubert, worldwind.earth/explorer, 2016-2018); the image was adapted to 
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Extended Data Fig. 9 | Validation of degenerate bhcC primers. Degenerate 
primers for bhcC were used for qPCR with different amounts of genomic DNA 
from P. denitrificans DSM 413, Rhodobacter sphaeroides 2.4.1, and E. coli K-12 
MG165S as template. While the bhcCgene from P. denitrificans DSM 413 is 
amplified, genomic DNA from organisms that lack the bhc gene cluster does 
not result in reliable amplification. Data are mean+s.d.;n=3 independent 
experiments. 
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Extended Data Table 1| X-ray diffraction data collection and 
model refinement statistics 


B-hydroxyaspartate aldolase iminosuccinate reductase 
with bound pyridoxalphosphate with bound NAD* 


Data collection 
Space group 
Cell dimensions 

a, b, c (A) 

a, B,C) 
Resolution (A) 
Rmerge 
I/ol 
CC1/2 (%) 
Completeness (%) 
Redundancy 
Refinement 
Resolution (A) 

No. unique reflections 
Ruwork / Riree 
No. atoms 

Protein 

Ligands 

Water 
B-factors 

Protein 

Ligands 

Water 
R.m.s. deviations 

Bond lengths (A) 

Bond angles (°) 


(PDB ID 6QKB) 
P 212121 


66.60, 75.25, 157.31 
90.00, 90.00, 90.00 
29.03 - 1.70 (1.79 - 1.70) 
0.134 (0.858) 

10.4 (1.9) 

99.7 (70.8) 

99.8 (99.0) 

6.7 (6.5) 


29.03 - 1.70 (1.74 - 1.70) 
87194 (5909) 

0.158 / 0.177 

6671 

5817 

32 

822 


17.05 
23.66 
31.58 


0.006 
0.84 


(PDB ID 6RQA) 


P212121 


50.39, 72.41, 164.27 
90.00, 90.00, 90.00 

29.40 — 2.56 (2.70 - 2.56) 
0.097 (0.527) 

12.6 (3.3) 

99.8 (90.7) 

99.9 (100.0) 

6.5 (6.5) 


29.40 — 2.56 (2.70 — 2.56) 
20152 (2893) 

0.176 / 0.225 

5027 

4780 

120 

127 


52.02 
66.03 
47.34 


0.004 
0.52 


Numbers in parentheses indicate statistics for the highest resolution shell. The structures 
were determined from single crystals. 
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The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
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For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 
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Sample size No sample size calculation was performed. The common rationale was to have a sample size that allows us to calculate a standard deviation. 
All Michaelis-Menten plots for the kinetic characterization of enzymes include 18 data points (three independent experiments at 6 different 
substrate concentrations). Controls verifying same levels of specific activities between different enzyme preparations were performed 
routinely. 

Three independent experiments (= assays in independent cuvettes) were conducted for the determination of enzyme activities in vitro, both 

with purified enzymes and in P. denitrificans cell extracts. 

Three biological replicates (= independent cultures) of P. denitrificans were used to generate cell-free extracts for determination of enzyme 

activities. 

Five biological replicates (= independent water samples) were used to determine glycolate concentrations in seawater. 

Three technical replicates (= repeated injections of the same sample) were measured for the determination of glycolate concentrations in 

each single seawater sample via LC-MS. 
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Three or six biological replicates (= independent cultures in different wells of 96 well-plate) were measured for the determination of P. 

denitrificans growth rates. 

Four biological replicates (= independent cultures) per condition were used to generate biomass for proteomic analysis. 

Four biological replicates (= independent water samples) were used to determine glycolate uptake rates. 


Data exclusions No data were excluded from the analyses. 

Replication Controls verifying same levels of specific activities between different enzyme preparations were performed routinely. When applicable, we 
performed our experiments using multiple independent samples (e.g., independent cultures, independent water samples). All attempts at 
replication of our findings were successful. 


Randomization — The microorganisms used in this study were selected and divided randomly in the different conditions. No criteria of selection were applied. 
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charge of the analysis was the one responsible for taking the samples. 
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PXD013274. Metagenome-assembled genomes (MAGs; Extended Data Fig. 6 + 8) are available under accession PRJEB28156 at the European Nucleotide Archive 
(ENA). All other relevant data are available in this article and its Supplementary Information files. 


Field-specific reporting 


Please select the one below 


[_] Life sciences 


that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


[_] Behavioural & social sciences X]_ Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Ecological, evolutionary & environmental sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Study description 


Research sample 


Sampling strategy 


Data collection 


Timing and spatial scale 


Data exclusions 
Reproducibility 
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Did the study involve field 


Investigation of spring phytoplankton bloom 2018 at Helgoland island. Data on chlorophyll A concentrations, total cell count, 
glycolate concentrations and uptake rates, and abundance of bhcC genes and transcripts in microbial biomass from water samples 
are quantitative. 


Water samples were taken from a designated location (see below) close to Helgoland island; no statistical method was used to 
predetermine sample size. This sampling site was chosen because it had been used in previous sampling campaigns by the MPI 
Bremen and the Biologische Anstalt Helgoland. Therefore, it was relevant to take water samples in 2018 that could be compared to 
samples taken at the same site in previous years. 


Five biological replicates (= independent water samples) were used to determine glycolate concentrations in seawater. Four 
biological replicates (= independent water samples) were used to determine glycolate uptake rates. One water sample was used to 
determine total cell count and chlorophyll A concentration. One water sample was used to obtain microbial biomass for DNA and 
RNA extraction. No sample size calculation was performed. The common rationale for replicate samples was to have a sample size 
that allows us to calculate a standard deviation. 


Sampling was carried out on each working day (Monday - Friday) by the crew of the RV Aade (www.awi.de/en/expedition/ships/ 
more-ships.html) at the research site ‘Kabeltonne’ (54° 11.3' N, 7° 54.0' E) from approximately 1 m water depth in 20 L carboys. The 
water samples for microbial biomass were subjected to a fractionating filtration directly upon arrival in the Biologische Anstalt 
Helgoland laboratory (typically less than one hour after sampling). The entire filtration process for all fractions was usually finished 
within 3 h, i.e. latest 4 h after the sampling. Further data collection was carried out by the staff of Biologische Anstalt Helgoland and 
MPI Bremen (chlorophyll A concentrations, total cell count), or Lennart Schada von Borzyskowski (glycolate concentrations and 
uptake rates, abundance of bhcC genes and transcripts), together with Peter Claus and Nina Socorro Cortina (glycolate 
concentrations). 


Water samples were collected once daily in the morning (between 8 and 10 am) on each working day between March 1 and May 31, 
2018. Water samples were collected in the North Sea close to Helgoland island, at the research site ‘Kabeltonne’ (54° 11.3'N, 7° 54.0! 
E), from approximately 1 m water depth. 

No data were excluded from the analyses. 


Not applicable, since sampling was carried out once daily. 


The water samples used in this study were selected and divided randomly in the different conditions to determine glycolate uptake 
rates. No criteria of selection were applied. 


Blinding of samples was not applicable for the kind of experiments included in this study, since water samples (in replicates) were 
taken on different dates and labeled accordingly. 


work? Yes No 


Field work, collection and transport 


Field conditions 


Location 


Access and import/export 


Disturbance 


Water samples were collected once daily in the morning (between 8 and 10 am) on each working day between March 1 and May 
31, independent of temperature or weather conditions. 


Water samples were collected in the North Sea close to Helgoland island, at the research site ‘Kabeltonne’ (54° 11.3' N, 7° 54.0' 
E), from approximately 1 m water depth. 


Water samples were collected without harming the habitat. No import or export of samples was conducted in the frame of this 
study. 


No disturbance was caused by the study. 
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Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 
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Chronic liver disease due to alcohol-use disorder contributes markedly to the global 
burden of disease and mortality! °. Alcoholic hepatitis is a severe and life-threatening 
form of alcohol-associated liver disease. The gut microbiota promotes ethanol- 
induced liver disease in mice’, but little is known about the microbial factors that are 
responsible for this process. Here we identify cytolysin—a two-subunit exotoxin that 
is secreted by Enterococcus faecalis*°—as a cause of hepatocyte death and liver injury. 
Compared with non-alcoholic individuals or patients with alcohol-use disorder, 
patients with alcoholic hepatitis have increased faecal numbers of E. faecalis. The 
presence of cytolysin-positive (cytolytic) E. faecalis correlated with the severity of 
liver disease and with mortality in patients with alcoholic hepatitis. Using humanized 
mice that were colonized with bacteria from the faeces of patients with alcoholic 
hepatitis, we investigated the therapeutic effects of bacteriophages that target 
cytolytic E. faecalis. We found that these bacteriophages decrease cytolysin inthe 
liver and abolish ethanol-induced liver disease in humanized mice. Our findings link 
cytolytic FE. faecalis with more severe clinical outcomes and increased mortality in 
patients with alcoholic hepatitis. We show that bacteriophages can specifically target 
cytolytic E. faecalis, which provides a method for precisely editing the intestinal 
microbiota. A clinical trial with a larger cohort is required to validate the relevance of 
our findings in humans, and to test whether this therapeutic approach is effective for 


patients with alcoholic hepatitis. 


The most severe form of alcohol-related liver disease is alcoholic hepa- 
titis; mortality ranges from 20% to 40% at 1-6 months, and as many as 
75% of patients die within 90 days of a diagnosis of severe alcoholic 
hepatitis” °. Therapy with corticosteroids is only marginally effective’. 
Early liver transplantation is the only curative therapy, but is offered 
only at select centres and toa limited group of patients”. 

Alcohol-related liver disease can be transmitted via faecal micro- 
biota*. We investigated the microorganisms and microbial factors that 
are responsible for this transmissible phenotype and for progression 
of alcohol-related liver disease. 


Cytolysin linked to increased mortality 


We performed 16S ribosomal RNA (rRNA) gene sequencing to deter- 
mine whether chronic alcohol use and alcoholic hepatitis are associated 


with an altered composition of the faecal microbiota. Differences in 
faecal microbiota composition were noted in patients with alcohol-use 
disorder and alcoholic hepatitis, compared to subjects without alcohol- 
use disorder (controls) (Fig. 1a, Extended Data Fig. 1a, b, Supplementary 
Tables 1,2). One substantial difference that we observed was an increase 
inthe proportion of Enterococcus spp. in patients with alcoholic hepa- 
titis: in these patients, 5.59% of faecal bacteria were Enterococcus spp. 
compared with almost none in controls (0.023%; for comparison, 
0.004% of all reads were Enterococcus spp. in the Human Microbiome 
Project) or patients with alcohol-use disorder (0.024%). Faecal sam- 
ples from patients with alcoholic hepatitis had about 2,700-fold more 
E. faecalis than samples from controls, as measured by quantitative PCR 
(qPCR) (Extended Data Fig. 1c), which is consistent with the 16S rRNA 
sequencing results. About 80% of patients with alcoholic hepatitis are 
positive for F. faecalis in their faeces (Extended Data Fig. 1d). 


Alist of affiliations appears at the end of the paper. 
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Fig. 1|£. faecalis cytolysin is associated with mortality in patients with 
alcoholic hepatitis. a, 16S rRNA sequencing of faecal samples from controls 
(n=14), patients with alcohol-use disorder (n= 43) or alcoholic hepatitis (n =75). 
Weuse principal coordinate analysis (PCoA) based onJaccard dissimilarity 
matrices to show f-diversity among groups at the genus level. The composition 
of faecal microbiota was significantly different between each group (P<0.01). 
b, Percentage of subjects with faecal samples that were positive for bothcylL, 
and cylL, DNA sequences (cytolysin-positive), in controls (n=25), patients with 
alcohol-use disorder (n= 38) or alcoholic hepatitis (n = 82), assessed by qPCR. 
Statistically significant differences were detected between controls and 
patients with alcoholic hepatitis (P< 0.01), and between patients with alcohol- 
use disorder and patients with alcoholic hepatitis (P< 0.001). c, Kaplan-Meier 
curve of survival of patients with alcoholic hepatitis whose faecal samples were 
cytolysin-positive (n=25) or cytolysin-negative (n=54) (P< 0.0001). d, Core 
genome single-nucleotide polymorphism (SNP) tree of E. faecalis strains 
isolated from patients with alcoholic hepatitis (n = 93 strains, from 24 patients), 
showing phylogenetic diversity of cytolysin-positive (red) F. faecalis. 
Genomically identical isolates from the same patient were combined, and are 
shownasasingle dot. Scale bar represents the nucleotide substitutions per 
SNP site. Pvalues are determined by permutational multivariate analysis of 
variance (PERMANOVA) followed by false discovery rate (FDR) procedures (a), 
two-sided Fisher’s exact test followed by FDR procedures (b) or two-sided log- 
rank (Mantel-Cox) test (c). The exact group size (n) and Pvalues for each 
comparison are listed inSupplementary Table 10. 


The colonization of mice with E. faecalis induces mild hepatic stea- 
tosis and exacerbates ethanol-induced liver disease”, by mechanisms 
that are unclear. Cytolysin is a bacterial exotoxin (or bacteriocin) that is 
produced by £. faecalis”, and which contains two post-translationally 
modified peptides (CyIL,” and CylL,”) in its bioactive form®. The two 
peptides are encoded by two separate genes: cylL, and cylL, respec- 
tively”. Cytolysin has lytic activity against not only Gram-positive bacte- 
ria, but also eukaryotic cells”. We detected cy/L, and cylL, genomic DNA 
(cytolysin-positive) in faecal samples from 30% of patients with alco- 
holic hepatitis; none of the faecal samples from controls and only one 
sample froma patient with alcohol-use disorder was cytolysin-positive, 
as detected by qPCR (Fig. 1b). Importantly, 89% of cytolysin-positive 
patients with alcoholic hepatitis died within 180 days of admission, 
compared to only 3.8% of cytolysin-negative patients (P< 0.0001) 
(Fig. Ic). Among the cytolysin-positive patients, 72.2% (13 out of 18) 
died owing to liver failure (including complications related to liver 
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failure, such as gastrointestinal bleeding) (Supplementary Table 2). 
Infection was not associated with 30-day, 90-day or 180-day mortality 
(P= 0.403, 0.234 or 0.098) in patients with alcoholic hepatitis. 

Our univariate logistic and Cox regression of laboratory and clinical 
parameters found an association between the detection of cytolysin- 
encoding genes in faeces and the international normalized ratio (INR), 
platelet count, the model for end-stage liver disease (MELD) score, the 
sodium MELD score, the age, serum bilirubin, INR and serum creatinine 
(ABIC) score and death (Supplementary Table 3). In the multivariate Cox 
analysis, detection of cytolysin-encoding genes in faeces was associated 
with 90-day (P= 0.004) and with 180-day mortality (P= 0.001) (Supple- 
mentary Table 3), even after we adjusted for the geographical origin of 
the patient, antibiotic treatment, platelet count, and creatinine, bilirubin 
and INR as components of the MELD score. We found no multicollinear- 
ity between the detection of faecal cytolysin-encoding genes and these 
cofactors (variance inflation factor <1.6), whichindicates that cytolysinis 
anindependent predictor of mortality in patients with alcoholic hepatitis. 
When we performed receiver-operating-characteristic curve analysis for 
90-day mortality, cytolysin had an area under the curve of 0.81, which was 
superior to other widely used predictors for mortality in clinical practice 
(Extended Data Fig. le). On the basis of our findings, we propose that the 
detection of cytolysin may be a prognostic factor for more severe liver- 
related outcomes and increased risk of death, anda stronger predictor of 
mortality than MELD, ABIC and the discriminant function score. 

To determine phylogeny of F. faecalis in patients with alcoholic 
hepatitis, we performed targeted culturing from stool samples. Whole- 
genome sequencing of 93 E. faecalis isolates revealed a broad phylo- 
genetic diversity of cytolysin-positive F. faecalis from patients with 
alcoholic hepatitis (Fig. 1d), which indicates that cytolysin production 
is a variable trait among E. faecalis isolates and that cytolysin is car- 
ried in mobile genetic elements, which include both chromosomally 
encoded pathogenicity islands and plasmids“. Detection of any other 
antimicrobial resistance genes or virulence genes in £. faecalis isolates 
did not correlate with disease severity or mortality in patients with 
alcoholic hepatitis (Supplementary Table 4). 

The total amount of faecal EF. faecalis, or faecal E. faecalis positivity, 
did not correlate with disease severity or mortality in patients with 
alcoholic hepatitis (supplementary Tables 5, 6). Cytolysin-positive 
and cytolysin-negative patients with alcoholic hepatitis had similar 
amounts of faecal E. faecalis (Extended Data Fig. 1f). Although there 
were differences in the composition of the gut microbiota in patients 
with alcoholic hepatitis from different geographical regions (Extended 
Data Fig. 1g), the proportion of cytolysin-positive patients, total amount 
of faecal E. faecalis, faecal E. faecalis positivity (Extended Data Fig. 1h-j), 
treatment and clinical outcomes (30-day and 90-day mortality) did 
not differ significantly among the regions or centres (Supplemen- 
tary Table 7). In addition, cirrhosis was not associated with cytolysin 
positivity, the total amount of faecal F. faecalis or faecal F. faecalis 
positivity in patients with alcoholic hepatitis (Extended Data Fig. 1k-m, 
Supplementary Tables 4-6). These results confirm our findings that 
the presence of cytolysin-producing F. faecalis rather than the total 
amount or presence of F. faecalis per se determines the severity of 
alcoholic hepatitis and mortality. 


Cytolysin and ethanol-induced liver disease 


To determine whether cytolysin contributes to liver damage mediated 
by E. faecalis, we gavaged mice with a cytolytic E. faecalis strain (FA2- 
2(pAM714)) or anon-cytolytic F. faecalis strain (FA2-2(pAM771))°; the 
mice were then placed ona chronic-binge ethanol diet. Compared 
to mice gavaged with phosphate-buffered saline (PBS), mice fed with 
ethanol after they were gavaged with cytolytic F. faecalis developed 
more severe liver injury as indicated by a higher level of alanine amino- 
transferase (ALT) (Extended Data Fig. 2a) and increased hepatic stea- 
tosis (Extended Data Fig. 2b, c). Mice that were fed ethanol after they 
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Fig. 2| Transplantation of faeces from cytolysin-positive patients with 
alcoholic hepatitis exacerbates ethanol-induced liver disease in 
gnotobiotic mice. a-g, C57BL/6 germ-free mice were colonized with faeces 
from two cytolysin-positive and two cytolysin-negative patients with alcoholic 
hepatitis, and subjected to the chronic-binge feeding model. a, Serum levels of 
ALT. b, Hepatic triglyceride content. c, Representative sections of liver stained 
with haematoxylin and eosin (H & E). d-f, Hepatic levels of mRNAs that encode 
111b, Cxcl1 and Collal1.g, Proportions of mice that were positive for cytolysinin 
the liver, measured by qPCR for cy/L ;.h, Lactate dehydrogenase (LDH) assay to 
measure cytotoxicity of hepatocytes isolated from mice that were fed an oral 
isocaloric control diet (five groups, left) or chronic-binge ethanol diet (five 


were gavaged with cytolytic E. faecalis also had more liver inflamma- 
tion with higher expression levels of mRNAs that encode inflamma- 
tory cytokines and chemokines (//1b, Cxcl1 and Cxcl2) (Extended Data 
Fig. 2d—-f), compared with mice given PBS. Mice that were fed ethanol 
after they were gavaged with non-cytolytic F. faecalis had significantly 
less ethanol-induced liver injury, steatosis and inflammation (Extended 
Data Fig. 2a-f) and longer survival times (Extended Data Fig. 2g), as 
compared with mice that were fed ethanol after they were administered 
with cytolytic F. faecalis. 

To explore the mechanism of cytolysin-associated liver damage, 
we measured cytolysin in the liver. CylL, was significantly increased 
in the liver of mice given cytolytic £. faecalis but not in the liver of 
mice that were not given F. faecalis or of mice gavaged with non-cyto- 
lytic E. faecalis after chronic ethanol administration (Extended Data 
Fig. 2h). E. faecalis was detectable in the liver of mice given cytolytic and 
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groups, right), and incubated with vehicle, CyIL,”, CyIL,” or both of the 
cytolysin subunits at the indicated concentrations without (-) or with (+) 
ethanol (25 mM) for 3h. The survival of hepatocytes was determined in three 
independent experiments. Scale bar, 100 pm. Results are expressed as 

mean +s.e.m. (a,b, d-f, h). Pvalues are determined by one-way analysis of 
variance (ANOVA) with Tukey’s post hoc test (a, b, d-f), two-sided Fisher’s exact 
test followed by FDR procedures (g) or two-way ANOVA with Tukey’s post hoc 
test (h). All results were generated from at least three independent replicates. 
The exact group size (n) and Pvalues for each comparison are listed in 
Supplementary Table 10. *P< 0.05, **P< 0.01, ***P< 0.001, ****P< 0.0001. 


non-cytolytic £. faecalis and fed an ethanol diet, but not when mice were 
fed an isocaloric (control) diet (Extended Data Fig. 2i); this indicates 
that ethanol-induced changes in the gut barrier are necessary for the 
translocation of cytolytic F. faecalis from the intestine to the liver. The 
livers of ethanol-fed mice that were given cytolytic or non-cytolytic 
E. faecalis had positive E. faecalis cultures (Extended Data Fig. 2j). We 
observed an increased intestinal permeability in ethanol-fed mice com- 
pared with mice fed with an isocaloric diet, but this was independent 
of gavaging cytolytic or non-cytolytic F. faecalis after chronic ethanol 
administration (Extended Data Fig. 2k), indicating that cytolysin does 
not affect intestinal barrier function. 

Administration of cytolytic or non-cytolytic F. faecalis to mice did not 
significantly change the composition of the intestinal microbiota, as 
shown by 16S rRNA gene sequencing (Extended Data Fig. 21). Cytolytic 
E. faecalis did not affect intestinal absorption or hepatic metabolism 
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Fig. 3 | Phage therapy against cytolytic E. faecalis abolishes ethanol-induced 
liver disease in gnotobiotic mice. a, Transmission electron microscopy 
revealed that the phages we isolated were either siphophages (Ef5.1, Ef5.2, 
Ef5.3, Ef5.4 and Ef2.2) or myophages (Ef2.1and Ef2.3). Scale bar, 50nm. 

b-h, C57BL/6 germ-free mice were colonized with faeces from two cytolysin- 
positive patients with alcoholic hepatitis (faeces from one of these patients 
were also used in Fig. 2) and subjected to the chronic-binge feeding model, 
gavaged with control phages against C. crescentus (10"° plaque-forming units 
(PFUs)) or acocktail of three or four different phages that target cytolytic 

E. faecalis (10"° PFUs), 1 day before an ethanol binge. b, Serum levels of ALT. 


of ethanol, as indicated by serum levels of ethanol and hepatic levels 
of Adh1 and Cyp2e1 mRNAs (which encode the two primary enzymes 
that metabolize ethanol in the liver) (Extended Data Fig. 2m, n). These 
results indicate that F. faecalis that produce cytolysin promote ethanol- 
induced liver disease in mice. 

To extend our findings to humans, we colonized germ-free mice 
with faeces from cytolysin-positive and cytolysin-negative patients 
with alcoholic hepatitis (Supplementary Table 8). Consistent with 
our findings from mice colonized with cytolytic F. faecalis, gnotobi- 
otic C57BL/6 mice colonized with faeces from two cytolysin-positive 
patients developed more severe ethanol-induced liver injury, steatosis, 
inflammation and fibrosis than mice given faeces from two cytolysin- 
negative patients (Fig. 2a-f, Extended Data Fig. 3a—d). Transplantation 
of faeces from cytolysin-positive patients reduced the survival time 
of the mice (Extended Data Fig. 3e) and increased translocation of 
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c, Hepatic triglyceride content. d, Representative H & E-stained liver sections. 
Scale bar, 100 pm. e-g, Hepatic levels of mRNAs that encode //1b, Cxcll and 
Collai.h, Proportions of mice that were positive for cytolysin in the liver, 
measured by qPCR for cylL;. Results are expressed as mean+s.e.m. 

(b,c, e-g). Pvalues are determined by two-way ANOVA with Tukey’s post hoc 
test (b,c, e-g) or two-sided Fisher’s exact test followed by FDR procedures (h). 
Allresults are generated from atleast three independent replicates. The exact 
group size (n) and Pvalues for each comparison are listed in Supplementary 
Table 10. *P< 0.05, **P< 0.01, *** P< 0.001. 


cytolytic F. faecalis to the liver after ethanol administration (Fig. 2g). 
The overall composition of the intestinal microbiota was not differ- 
ent between mice fed the control diet and colonized with faeces from 
cytolysin-positive or cytolysin-negative donors with alcoholic hepatitis, 
as shown by 16S rRNA gene sequencing. Mice transplanted with faeces 
from one of the cytolysin-positive patients with alcoholic hepatitis 
(patient no. 2) showed a microbiota that was significantly different 
from that of the other mouse groups after ethanol administration 
(Extended Data Fig. 3f). Non-cytolytic £. faecalis was not detected 
in stool samples from donors with cytolytic F. faecalis (Extended 
Data Fig. 3g). We did not observe differences in intestinal absorp- 
tion or hepatic metabolism of ethanol between mice colonized with 
faeces from cytolysin-positive versus cytolysin-negative patients 
(Extended Data Fig. 3h, i). Together, these results provide further evi- 
dence that cytolysin promotes ethanol-induced liver disease. 
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Fig. 4 | Phages that target non-cytolytic E. faecalis do not reduce ethanol- 
induced liver disease in gnotobiotic mice. a, Transmission electron 
microscopy revealed that the phages we isolated were either podophages 
(Ef6.2, Ef6.3, Ef7.2, Ef7.3 and Ef7.4) or siphophages (Ef6.1, Ef6.4 and Ef7.1). Scale 
bar, 50 nm. b-h, C57BL/6 germ-free mice were colonized with faeces from two 
cytolysin-negative patients with alcoholic hepatitis and subjected to the 
chronic-binge feeding model, gavaged with control phages against 

C. crescentus (10"° PFUs) or acocktail of four different phages that target non- 


To determine the mechanism by which cytolysin increases liver dis- 
ease, we isolated hepatocytes from mice fed ethanol or control diets, 
and stimulated them with pure bioactive cytolysin peptides (CylL,” 
and CyIL,’’)®. Incubation of the primary mouse hepatocytes with the 
two cytolysin subunits caused a dose-dependent increase in cell death 
compared to hepatocytes that were incubated with vehicle or with one 
subunit only (Fig. 2h). When we isolated hepatocytes from ethanol-fed 
mice and then incubated these hepatocytes with ethanol, we did not 
observe increased levels of cytolysin-induced cell death compared to 
hepatocytes isolated from mice on the control diet, which indicates that 
cytolysin-induced hepatocyte cell death was independent of ethanol. 
The cytotoxic effects of cytolysin are possibly mediated by pore forma- 
tion, resulting in cell lysis. 


Bacteriophage treatment in liver disease 


To further demonstrate the potential causative role of cytolytic 
E. faecalis for the development of ethanol-induced steatohepatitis, 


cytolytic E. faecalis (10'° PFUs), 1 day before an ethanol binge. b, Serum levels of 
ALT.c, Hepatic triglyceride content. d, Representative H & E-stained liver 
sections. Scale bar, 100 um. e-g, Hepatic levels of mRNAs that encode //1b, 
Cxcl1 and Colla1.h, Faecal colony-forming units (CFUs) of Enterococcus. Results 
are expressed as mean +s.e.m. (b,c, e-h). Pvalues are determined by two-way 
ANOVA with Tukey’s post hoc test (b,c, e-h). All results were generated from at 
least three independent replicates. The exact group size (n) and Pvalues for 
each comparison are listed in Supplementary Table 10. *P< 0.05. 


we investigated the effects of treatment with bacteriophages (here- 
after, phages). Phages are ubiquitous in bacteria-rich environments, 
including the gut”’. £. faecalis phages that are highly strain-specific 
can be isolated”, which potentially makes the direct editing of gut 
microbiota feasible. It has previously been shown that Atp4a“' mice, 
which lack gastric acid, have overgrowth of intestinal enterococci, 
which is associated with increased susceptibility to alcohol-induced 
steatohepatitis”. The gavaging of wild-type mice with an E. faecalis 
strain isolated from Atp4a*“ mice led to increased ethanol-induced 
steatohepatitis". We found that this same E. faecalis strain expressed 
cytolysin. We then isolated four distinct phages from sewage water. 
These phages lyse the cytolytic £. faecalis strain isolated from Atp4a™ 
mice. All four phages were podophages of the virulent Picovirinae group 
(Extended Data Fig. 4). Atp4a°“' mice and their wild-type littermates 
were then placed onthe chronic—binge ethanol diet and gavaged with 
the lytic phage cocktail. Phages directed against Caulobacter crescen- 
tus, a bacterium that is present in freshwater lakes and streams”® but 
that does not colonize humans or rodents”””°, were used as controls. 
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Compared to Atp4a*“' mice gavaged with control phages or vehicle, 


Atp4a*“' mice gavaged with phages that target cytolytic F. faecalis 
had less severe liver injury, steatosis and inflammation after chronic 
ethanol feeding (Extended Data Fig. 5a-f). Administration of E. faecalis 
phages significantly reduced levels of cytolysin in the liver (Extended 
Data Fig. 5g) as well as faecal amounts of Enterococcus (Extended Data 
Fig. 5h). Phage administration did not affect the overall composition 
of the faecal microbiome, intestinal absorption or hepatic metabolism 
of ethanol (Extended Data Fig. 5i-k). 

To develop a therapeutic approach to precisely edit the intestinal 
microbiota, we cultured cytolytic E. faecalis strains from the faecal 
samples of patients with alcoholic hepatitis. We then isolated lytic 
phages from sewage water against these cytolytic F. faecalis strains; 
these phages had siphophage or myophage morphology (Fig. 3a, 
Extended Data Fig. 6). Gnotobiotic mice were colonized with faeces 
from two cytolysin-positive patients with alcoholic hepatitis (Sup- 
plementary Table 8) and given three or four different—but patient- 
specific—lytic phages against cytolytic F. faecalis. The phages against 
cytolytic E. faecalis abolished ethanol-induced liver injury and steatosis, 
as shown by lower levels of ALT, lower percentages of hepatic cells 
positive for terminal deoxynucleotide transferase-mediated dUTP 
nick-end labelling, and lower levels of hepatic triglycerides and oil red 
O-staining (Fig. 3b-d, Extended Data Fig. 7a, b), as well as by decreased 
hepatic levels of //1b, Cxcl1, Cxcl2, Collal and Acta2 mRNAs, and reduced 
hepatic levels of cy/L,, as compared with mice given control phages 
(against C. crescentus) (Fig. 3e-h, Extended Data Fig. 7c, d). Treatment 
with phages against cytolytic F. faecalis also reduced faecal amounts 
of Enterococcus (Extended Data Fig. 7e) without affecting the overall 
composition of the gut microbiota (Extended Data Fig. 7f). Intestinal 
absorption of ethanol and hepatic metabolism were similar in all groups 
(Extended Data Fig. 7g, h). 

To demonstrate that the effect of phage treatment occurs via the 
targeting of cytolysin-positive F. faecalis, rather than a reduction in 
cytolysin-negative FE. faecalis, we colonized gnotobiotic mice with 
faeces from cytolysin-negative patients with alcoholic hepatitis (Sup- 
plementary Table 8). Phages against non-cytolytic F. faecalis from 
patients were isolated from sewage water; they had siphophage or 
podophage morphology (Fig. 4a, Extended Data Fig. 8). These phages 
did not reduce features of ethanol-induced liver disease compared with 
control phages (Fig. 4b-g, Extended Data Fig. 9a-h), despite the reduc- 
tion of faecal Enterococcus (Fig. 4h). Our findings indicate that treat- 
ment with lytic phages can selectively attenuate the ethanol-induced 
liver disease caused by cytolysin-positive F. faecalis in humanized mice. 


Discussion 


Phage-based therapies have predominantly been studied in patients 
with bacterial infections in the gastrointestinal tract” *?, urinary 
tract*** and other organ systems”*”*. The results of these studies— 
although mixed in terms of efficacy—strongly suggest that phage treat- 
ment offers a safe alternative to antibiotics”°”’. However, safety studies 
are required for complex populations (such as patients with alcoholic 
hepatitis), because phages can induce a strong immune reaction”. 
Further work is required to determine whether phages that target cyto- 
lytic F. faecalis might be used to treat patients with alcoholic hepatitis, 
a life-threatening disease that at present has no effective treatment. 
Eradication of this specific bacterial strain might produce better out- 
comes than current treatments, and environmental sources can be 
used to easily isolate phages that target cytolysin-positive F. faecalis. 
Here we provide an example of the efficacy of approaches based on 
phages in mice for a disease that is not considered a classic infectious 
disease. Our data also suggest that cytolysin may be used as a predictive 
biomarker of severe alcoholic hepatitis; an independent, prospective 
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cohort is therefore needed to validate cytolysin as a biomarker, and to 
extend the phage findings in mice to human patients. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Patient cohorts 

Patient cohorts have previously been described”° **. We evaluated 
26 subjects without alcohol-use disorder (controls; social drinkers 
consuming less than 20 g/day), 44 patients with alcohol-use disorder 
and 88 patients with alcoholic hepatitis. Patients with alcohol-use 
disorder fulfilling the DSM IV criteria® of alcohol dependence and 
with active alcohol consumption (self-reported >60 g/day) presented 
with various stages of liver disease (21% had advanced F3/4 fibrosis 
based on fibrosis-4 index) (Supplementary Table 1). Patients with 
alcohol-use disorder were recruited from an alcohol withdrawal unit 
in San Diego and Brussels, where they followed a detoxification and 
rehabilitation programme. At admission to the hospital, acomplete 
medication and medical history was taken, and a complete physical 
examination was performed, including collection of bio-specimens, 
basic demographic data (suchas age, gender, weight and height) and 
self-reported daily alcohol consumption. Patients were actively drink- 
ing until the day of admission. Controls or patients with alcohol-use 
disorder did not take antibiotics or immunosuppressive medication 
during the two months preceding enrolment. Other exclusion criteria 
were diabetes, inflammatory bowel disease, known liver disease of any 
other aetiology, and clinically important cardiovascular, pulmonary or 
renal co-morbidities. Patients with alcoholic hepatitis were enrolled 
from the InTeam Consortium (ClinicalTrials.gov identifier number: 
NCT02075918) from centres in the USA, Mexico, UK, France and Spain. 
Inclusion criteria for this study were active alcohol abuse (>50 g/day 
for men and >40 g/day for women) in the past 3 months, aspartate 
aminotransferase (AST) > ALT and total bilirubin >3 mg/dl inthe past 3 
months, andaliver biopsy and/or clinical picture consistent with alco- 
holic hepatitis. Exclusion criteria were autoimmune liver disease (ANA 
>1/320), chronic viral hepatitis, hepatocellular carcinoma, complete 
portal vein thrombosis, extrahepatic terminal disease, pregnancy and 
alack of signed informed consent. In all patients, the clinical picture 
was consistent with alcoholic hepatitis and in patients who underwent 
liver biopsy, the histology was consistent with the diagnosis of alco- 
holic hepatitis. Liver biopsies were only done if clinically indicated 
as part of routine clinical care for diagnostic purposes of alcoholic 
hepatitis. Bio-specimens were collected during their admission to 
the hospital. The median time of specimen collection was 4 days fol- 
lowing admission to the hospital (range 0-24, n= 82). For one patient 
who underwent liver transplantation, the transplantation date was 
considered as date of death. Patients were censored at the time point 
at which they were last seen alive. The baseline characteristics are 
shown in Supplementary Tables 1, 2. Faecal 16S rRNA sequencing, 
Enterococcus culture and qPCR were performed. The MELD score, ABIC 
score and discriminant function were calculated from all alcoholic 
hepatitis patients from whom respective laboratory values were avail- 
able. The protocol was approved by the Ethics Committee of Hopital 
Huriez, Universidad Autonoma de Nuevo Leon, Hospital Universitari 
Vall d’Hebron, King’s College London, Yale University, University of 
North Carolina at Chapel Hill, Weill Cornell Medical College, Columbia 
University, University of Wisconsin, VA San Diego Healthcare System, 
University of California San Diego (UCSD) and Université Catholique 
de Louvain. Patients were enrolled after written informed consent was 
obtained from each patient. 


Mice 

C57BL/6 mice were purchased from Charles River and used in Fig. 2h and 
Extended Data Fig. 2. CS57BL/6 germ-free mice were bred at UCSD and 
used in Figs. 2a-g, 3, 4, Extended Data Figs. 3, 7 and 9. Sublytic Atp4a™™" 


mice ona C57BL/6 background have previously been described"* and 
heterozygous mice were used for breeding; sublytic Atp4a*“ littermate 
mice and their wild-type littermates were used in Extended Data Fig. 5. 

Female and male mice (age of 9-12 weeks) were placed ona chronic- 
binge ethanol diet (NIAAA model) as previously described’. Mice were 
fed with Lieber—DeCarli diet and the caloric intake from ethanol was 
0% on days 1-5 and 36% from day 6 until the end of the study period. At 
day 16, mice were gavaged with a single dose of ethanol (5 g/kg body 
weight) in the early morning and killed 9 h later. Pair-fed control mice 
received a diet with an isocaloric substitution of dextrose. 

Stool samples from patients with alcoholic hepatitis (Fig. 1) were 
used for faecal transplantation in germ-free mice. Mice were gavaged 
with 100 pl of stool samples (1g stool dissolved in 30 ml Luria-Bertani 
(LB) medium containing 15% glycerol under anaerobic conditions), 
starting at an age of 5-6 weeks and repeated 2 weeks later. Two weeks 
after the second gavage, mice were placed on the ethanol or control 
(isocaloric) diet. 

In studies of the effects of cytolysin, 5 x 10° CFUs of acytolytic EF. fae- 
calis strain (FA2-2(pAM714)), anon-cytolytic F. faecalis strain (FA2- 
2(pAM771))° (E. faecalis Acytolysin) (kindly provided by M.S. Gilmore), 
or PBS (vehicle control) were fed to mice by gavage every third day, 
starting from day 6 through day 15 of ethanol feeding. Administration 
every third day was necessary, given that F. faecalis does not colonize 
mice" (Extended Data Fig. 20). To determine the effect of phage treat- 
ment, 10”° PFUs of £. faecalis phages (or C. crescentus phage phiCbK as 
control) were gavaged to the mice 1 day before the ethanol binge (at 
day 16). All animal studies were reviewed and approved by the Institu- 
tional Animal Care and Use Committee of UCSD. 


Bacteriophage isolation and amplification 

The £. faecalis strain from Atp4a*“"' mice faeces has previously been 
isolated" and was used to isolate phages Efmus1, Efmus2, Efmus3 and 
Efmus4 (phages specific to the E. faecalis strain isolated from mouse 
faeces were named as Efmus with a number (Ef for F. faecalis, mus for 
mouse, digit for isolation order). F. faecalis strains from human stool 
samples were isolated using methods described below, and the cor- 
responding phages were named as Ef with patient number plus a digit 
(Ef for E. faecalis, last digit for isolation order). All F. faecalis strains 
were grown statically in brain-heart infusion (BHI) broth or on BHI 
agar at 37 °C. C. crescentus phage phiCbK was purified as previously 
described*. 

E. faecalis phages were isolated from untreated raw sewage water 
obtained from North City Water Reclamation Plant in San Diego. Fifty 
millilitres of raw sewage water was centrifuged at 8,000g for 1 min 
at room temperature to pellet large particles. The supernatant was 
passed through a 0.45-ym and then a 0.2-um syringe filter (Whatman, 
PES membrane). One hundred microlitres of the clarified sewage was 
mixed with 100 pl overnight F. faecalis culture and then added to BHI 
broth top agar (0.5% agar) and poured over a BHI plate (1.5% agar). After 
overnight growth at 37 °C, the resulting plaques were recovered using 
asterile pipette tip in 500 pl PBS. Phages were replaqued on F. faecalis 
three more times to ensure that the phages were clonal isolates. 

High-titre phage stocks were propagated by infecting 200 ml of expo- 
nentially growing E. faecalis at a multiplicity of infection of 0.1in BHI 
broth containing 10 mM MgSO,,. Lysis was allowed to proceed for up to 
six hours at 37 °C with shaking. The lysates were centrifuged at 10,000g 
for 20 min at room temperature to remove the remaining bacterial cells 
and debris. Supernatant was then vacuum-filtered through a 0.2-um 
membrane filter and kept at 4 °C until use. 

Before mice were gavaged, 10-20 ml lysates were concentrated using 
Corning Spin-X UF Concentrators with 100,000-molecular weight 
cutoff (MWCO) toa volume of approximately 1 ml. Following concen- 
tration, the culture medium was replaced with PBS via diafiltration. 
The resulting lysate was further concentrated to a final volume of 
0.5 ml and adjusted to the required PFUs. 


Whole-genome sequencing for phages 

For all phages except Efmus4, 10 ml of lysates were treated with 10 pg/ml 
each of DNase and RNase at 37 °C for 1hand phages were precipitated 
by adding IM NaCl and 10% (w/v) polyethylene glycol 8000 (PEG 8000) 
and incubated at 4 °C overnight. Precipitated phages were then pel- 
leted by centrifugation at 10,000g for 10 min at 4 °C and resuspended 
in 500 pl of resuspension buffer (5 mM MgSO,). Phage DNA was then 
extracted using Promega Wizard DNA Clean-up kit (Promega). Phage 
genomes were sequenced using a combination of Illumina and Oxford 
Nanopore Technologies (ONT) MinION platforms. Illumina sequencing 
libraries were prepared using the Nextera XT library kit with bead-based 
size selection before loading onto Illumina flow cells. Sequencing was 
performed with either Illumina MiSeq Reagent Kit v3 in 2 x 300-bp or 
NextSeq 500 Mid Output Kit in 2 x 150-bp paired-end formats. ONT 
MinION sequencing libraries were prepared using the Rapid Barcoding 
Kit (SQK-RBKOO4) and loaded onto MinION R9.4 flow cells. ONT reads 
were basecalled with Albacore v.2.3.4 (ONT). The sequence reads were 
demultiplexed and adapters trimmed from ONT reads using Porechop 
v.0.2.3°°. A hybrid IIlumina-ONT de novo assembly was performed using 
the Unicycler v.0.4.7 pipeline®”. Subsequently, Pilon v.1.22°* was used 
iteratively to polish the assemblies with Illumina reads until no addi- 
tional corrections could be made. 

For phage Efmus4, 10° PFUs of the phage was filtered sequentially 
using 0.45-um and 0.2-um filters (VWR) and purified on a caesium 
chloride (CsCl) density gradient®’. One millilitre of the CsCl fraction was 
purified on Amicon YM-100 protein columns (Millipore) and treated 
with DNase I. DNA was isolated using a Qiagen UltraSens virus kit (Qia- 
gen), amplified using GenomiPhi V2 (GE Healthcare), and fragmented 
to 200 to 400 bp using a Bioruptor (Diagenode). Libraries were created 
using the Ion Plus fragment library kit and sequenced using a 316 Chip 
onanlon Torrent Personal Genome Machine (Life Technologies). Reads 
were trimmed according to modified Phred scores of 0.5 using CLC 
Genomics Workbench 4.9 (Cambridge), and the remaining reads were 
assembled using CLC Genomics Workbench 4.9 based on 98% identity 
with a minimum of 50% read overlap’. Reads were assembled into a 
single contig of 18,186 bp (20,118 x coverage). 

Mapping of ONT reads to the hybrid assemblies was used to deter- 
mine the orientation and terminal ends of linear phage genomes, and 
reference genomes served as guides to orient circular phage genomes. 
Phage genome assemblies were annotated using the NCBI Prokaryotic 
Genome Annotation Pipeline (PGAP)*°*.. 

Phage raw sequence reads and annotated genomes are available at 
NCBI under the following consecutive BioSample IDs (SAMN11089809- 
SAMN11089827). GenBank accession numbers include: Efmus1 
(MK721195), Efmus2 (MK721197), Efmus3 (MK721185), Efmus4 
(MK721193), Ef2.1 (MK693030), Ef2.2 (MK721189), Ef2.3 (MK721192), 
Ef5.1 (MK721199), Ef5.2 (MK721186), Ef5.3 (MK721200), Ef5.4 
(MK721191), Ef6.1 (MK721187), Ef6.2 (MK721188), Ef6.3 (MK721196), 
Ef6.4 (MK721190), Ef7.1 (MK721194), Ef7.2 (MK721183), Ef7.3 (MK721184) 
and Ef7.4 (MK721198). 

Genetic maps of phage genomes were generated by LinearDisplay. 
pl (https://github.com/JCVenterInstitute/LinearDisplay), a PERL script 
that uses Xfig (https://sourceforge.net/projects/mcj/) to render high- 
quality images. Preliminary annotation of genes was derived fromthe 
automated annotation and from Phage Finder*, which uses curated 
hidden Markov models and databases of core phage gene to annotate 
core gene functions. Annotation was then manually reviewed to assign 
the final colours. 


Phage phylogenetic tree 

Aphage whole-genome phylogeny tree was generated froma pairwise 
distance matrix calculated with the MASH program, which approxi- 
mates average nucleotide identity (ANI). First, a sketch file was created 
from all the 19 £. faecalis phage genomes isolated and sequenced in 


this study plus 54 Enterococcus phage genomes obtained from Gen- 
Bank, with 5,000 12-mers generated per genome (mash sketch -k 12 
-S 5000). The sketch file was then compared to all the initial phage 
genome sequences to generate the ANI matrix using the mash distance 
command using default settings. The GGRaSP R package was used to 
calculate the UPMGA phylogeny from the ANI distance matrix, after 
redundant phage genomes (genomes ANI >99.985) were removed using 
the GGRaSP R package with a user defined cutoff of 0.015 (ggrasp. 
cluster (threshold =0.015)). The resulting dendrogram was translated 
into newick format using the APE R package“, loaded into theiTOL tree 
viewer®, and annotated with taxonomic information and manually 
entered clade identification. 


Electron microscopy 

Phage morphology was examined by transmission electron microscopy 
of negatively stained grids, prepared using the valentine method“ with 
either 2% uranyl-acetate or 2% phosphotungstic acid, and examined 
at an acceleration voltage of 100 kV in the JEOL 1200 EX transmission 
electron microscope. 


Bacterial DNA extraction and 16S rRNA sequencing 

DNA from human stool samples, mouse liver sections or bacterial cul- 
ture was extracted as previously described”, and DNA from mouse 
faeces was extracted using QIAamp Fast DNA Stool kit (Qiagen). 16S 
rRNA PCR was completed using Illumina adaptor and barcode-ligated 
16S primers targeting the V4 region of the 16S rRNA gene*”“*, Amplicons 
were purified using the Qiaquick PCR purification kit (Qiagen) using 
manufacturer’s specifications. Purified amplicons were then quantified 
via TECAN assay (Tecan), normalized and pooled in preparation for 16S 
rRNA sequencing. Pooled library was quantified and checked for qual- 
ity using Agilent 2100 Bioanalyzer (Agilent Technologies). Library was 
sequenced on Illumina MiSeq (Illumina) using V2 reagent chemistry, 
500 cycles, 2 x 250-bp format using manufacturer’s specifications. 16S 
sequence reads were processed and operational taxonomic units were 
determined using our MOTHUR-based 16S rDNA analysis workflow 
as previously described". Raw 16S sequence reads can be found in 
the NCBI Sequence Read Archive (SRA) associated with Bioproject 
PRJNA525701. 


Real-time qPCR 

Bacterial genomic DNA was extracted from human stool samples and 
mouse liver". RNA was extracted from mouse liver and cDNAs were 
generated". Primer sequences for mouse genes were obtained from 
the NIH qPrimerDepot. Primer sequences for F. faecalis 16S rRNA gene, 
E. faecalis cylL, and cylL, genes have previously been described°*. 
All primers used in this study are listed in Supplementary Table 9. 
Mouse gene expression and amplification of bacterial genes were 
determined with Sybr Green (Bio-Rad Laboratories) using ABI Ste- 
pOnePlus real-time PCR system. The qPCR value of mouse genes was 
normalized to 18S. 


E. faecalis isolation and whole-genome sequencing 
Toisolate £. faecalis strains from human subjects, 50-300 mg of human 
stool was resuspended in 500 pl PBS, serial dilutions were made and 
100 pl was placed on plates with selective medium, BBL Enterococcosel 
broth (Becton Dickinson). Enterococci colonies were identified by the 
production of dark brown or black colour, generated by hydrolysis 
of esculin to esculetin (which reacts with ferric ammonium citrate). 
Each Enterococcus colony was then picked, and qPCR was performed 
toidentify £. faecalis, using specific primers against the £. faecalis 16S 
rRNA gene”®. For each subject, between 1 and 6 £. faecalis colonies were 
analysed and bacterial genomic DNA was then extracted as described 
in ‘Bacterial DNA extraction and 16S rRNA sequencing’. 

DNA sequencing was performed on the Illumina HiSeq Ten X generat- 
ing paired-end reads (2 x 151 bp). Bacterial genomes were assembled 
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and annotated using the previously described pipeline”. Antimicrobial 
resistance and virulence genes including cytolysin (cyl) genes carried 
by E. faecalis isolates were identified by comparing individual genome 
assemblies against the CARD and VFDB databases, respectively, using 
abricate v0.8.10 (https://github.com/tseemann/abricate)?™. 

For the phylogeny of E. faecalis, the genome assemblies of the study 
isolates were annotated with Prokka®, and a pangenome estimated 
using Roary”. A 95% identity cutoff was used, and core genes were 
defined as those in 99% of isolates. A maximum likelihood tree of the 
SNPs in the core genes was created using RAXML* and 100 bootstraps. 
The resulting tree was visualized using iTOL*. Genome sequence data 
of E. faecalis strains isolated in this study have been deposited in the 
European Nucleotide Archive (ENA) under the accession number 
PRJEB25007. Sequence reads are available at ENA under run acces- 
sion identifiers ERR3200171-ERR3200263. 


E. faecalis culture 

All E. faecalis strains were grown statically in BHI broth or on BHI 
agar plate at 37 °C. Fifty micrograms per millilitre erythromycin was 
added when cytolytic and non-cytolytic F. faecalis strains were grown 
(Extended Data Fig. 2). 


Determination of levels of faecal Enterococcus 

To determine levels of faecal enterococci in mice, 10-30 mg of mouse 
faeces was resuspended into 500 ul PBS and serial dilutions were made. 
Five microlitres of each dilution from each sample was spotted onto 
a plate with a selective medium, BBL enterococcosel broth (Becton 
Dickinson) and the plates were then incubated at 37 °C overnight. For 
Extended Data Fig. 20, agar plates contained 50 pg/ml erythromycin. 
Enterococci colonies were identified by the production ofa dark brown 
or black colour. Colony numbers of each sample were then counted, 
and CFUs were calculated. 


Cytolysin expression and purification 

To purify bioactive CyIL,” and CylIL,”, an Escherichia coli heterologous 
expression system was used. In brief, either 6xHis—CylL, or 6xHis—CyIL, 
were co-expressed with CyIM (the enzyme that performs dehydration 
and cyclization reactions on cytolysin) in £. colito yield fully dehydrated 
and cyclized full-length peptides. The His tag and leader peptide were 
then removed using recombinant CylA (27-412), the soluble domain 
of the native peptidase used in cytolysin maturation, to yield bioactive 
CylIL,” or CyIL,”. The resulting core peptides were further purified by 
reversed-phase high-performance liquid chromatography (HPLC). 

The cylL, and cylL, genes were previously cloned into the MCSI of a 
pRSFDuet-1 backbone vector that contained the cy/M gene in MCSII°. 
The cylA (27-412) gene was previously cloned into MCSI of apRSFDuet-1 
backbone vector”. F. coliBL21 Star (DE3) cells (50 pl) were transformed 
with 100 ng of either the cy/L, cy[M:pRSFDuet, cylL, cyiM:pRSFDuet or 
cylA (27-412):pRSFDuet plasmids via KCM chemical transformation. 
The cells were plated on LB agar plates supplemented with kanamycin 
(50 pg/ml) and grown at 37 °C overnight. One colony was picked to 
inoculate 15 ml of LB broth supplemented with kanamycin overnight 
at 37 °C. The culture was used to inoculate 1.51 of terrific broth supple- 
mented with kanamycin. Cultures were grown with shaking at 37 °C to 
an optical density at 600 nm (OD¢oo) of 0.8. The temperature of the incu- 
bator was lowered to 18 °C and expression was induced with the addi- 
tion of 0.3 mM final concentration of isopropyl B-D-thiogalactoside. 
The cultures were allowed to incubate at 18 °C for 18 h. The cells were 
collected by centrifugation at 5,000g for 12 min. The cell paste was 
collected and frozen at -70 °C. 

For the purification of the protease CylA (27-412), the cell paste 
was thawed and resuspended in 50 ml LanP buffer (20 mM HEPES, 1M 
NaCl, pH 7.5). The cell suspension was lysed by homogenization. The 
lysate was clarified by centrifugation at 13,000g for 45 min and filtered 
througha0.45-um centrifugal filter (Thermo Scientific). The clarified 


lysate was applied to a pre-equilibrated HisTrap HP 5 ml column (GE 
Healthcare) through a peristaltic pump. The loaded column was con- 
nected to an AKTA pure 25 Msystem. The protein was eluted by a linear 
gradient of LanP buffer to Elution Buffer (20 mM HEPES, 1M NaCl, 500 
mM imidazole, 10% glycerol, pH 7.5) over 30 min. The purest fractions, 
as determined by 4-20% SDS-PAGE, were combined, concentrated 
to 1 mg/ml by Amicon Ultra Centrifugal Filters (30 kDa MWCO), and 
buffer exchanged into storage buffer (20mM HEPES, 300 mM KCI, 10% 
glycerol, pH 7.5) by PD-10 desalting column (GE Healthcare). Protein 
concentration was determined by absorbance at 280 nm. 

For the purification of CyIL,” and CylIL,” peptides, the cell paste was 
thawed and resuspended in 50 ml of LanA Buffer B1 (6 M guanidine 
HCI, 20 mM NaH,PO,, 500 mM NaCl, 0.5 mM imidazole, pH 7.5). The 
cell suspension was lysed via sonication (2-s pulse on, 5-s pulse off, 
7 min total pulse on time). The cell lysate was clarified by centrifugation 
at 13,000g for 45 min. The clarified cell lysate was filtered through a 
0.45-um centrifugal filter and applied via gravity flow to a pre-equili- 
brated, 2 ml bed volume of His60 Ni Superflow Resin (Clonetech). After 
the lysate had been applied, the resin was washed with 15 ml of LanA 
Buffer B2 (4 M guanidine HCI, 20 mM NaH,PO,, 500 mM NaCl, 30 mM 
imidazole, pH 7.5). The resin was washed again with 15 ml of LanA Wash 
Buffer (20 mM NaH,PO,, 500 mM NaCl, 30 mM imidazole, pH 7.5) to 
remove the guanidine HCI. The peptides were eluted with 10 ml of LanA 
elution buffer (20 mM NaH,PO,, 500 mM NaCl, 500 mM imidazole, 
pH7.5). A 0.02 mg/ml final concentration of CylA (27-412) was added 
to the elution fraction and allowed to incubate at room temperature 
overnight to remove the leader peptide. 

The digestion was quenched by adding 2% (v/v) final concentration of 
trifluoroacetic acid. The solution was centrifuged at 4,500g for 10 min 
and filtered through a 0.45-pm syringe filter (Thermo Scientific). The 
core peptides were purified by semi-preparative reverse-phase HPLC 
using aPhenomenex Jupiter Proteo column (10 mm x 250 mm, 4 um, 90 
A) connectedto an Agilent 1260 Infinity II liquid chromatography system. 
The peptides were separated using a linear gradient of 3% (v/v) solvent 
B (acetonitrile + 0.1% trifluoroacetic acid) insolvent A (water + 0.1% trif- 
luoroacetic acid). The fractions were spotted ona matrix-assisted laser 
desorption/ionization (MALDI) target plate by mixing 1 pl ofsample with1 
pl ofa25 mg/ml solution of Super-DHB (Sigma) in 80% acetonitrile/water 
+0.1% trifluoroacetic acid. The fractions were analysed by MALDI-time of 
flight (TOF) mass spectrometry ona Bruker Ultraflextreme MALDI-TOF/ 
TOF operating in positive ionization, reflector mode. 


Primary mouse hepatocytes 

Hepatocytes were isolated from C57BL/6 female mice fed the chronic— 
binge ethanol diet (NIAAA model)”. Livers were perfused in situ with 
0.5 mM EGTA containing calcium-free salt solution and then perfused 
with a solution containing 0.02% (w/v) collagenase D (Roche Applied 
Science). Livers were then carefully minced and filtered using a 70-um 
nyloncell strainer. Hepatocytes were centrifuged at 50g for 1 min after 
3 washes. Hepatocyte viability was assessed by Trypan Blue (Thermo 
Fisher Scientific). Hepatocytes (1.5 x 10°) were seeded on 12-well plates 
coated with rat collagen type 1 in DMEM-F12 (Thermo Fisher Scientific) 
with insulin-transferrin-selenium (1% v/v) (Thermo Fisher Scientific) 
and 40 ng/ml dexamethasone (MP Biomedicals) containing 10% (v/v) 
fetal bovine serum (FBS) (Gemini Bio-Products) and antibiotics. After 4 
h, the culture was washed with DMEM-F12 medium and changed tothe 
same complemented medium without FBS*.. Then 16 h later, hepato- 
cytes were cultured with 0 or 25 mM ethanol and stimulated with 0,200 
or 400 nM CyIL,” and/or CylL,” in the same culture medium without 
FBS. After 3 h stimulation, hepatocyte cytotoxicity was assessed using 
Pierce LDH cytotoxicity detection kit (Thermo Fisher Scientific). 


Biochemical analysis 
Serum levels of ALT were determined using Infinity ALT kit (Thermo 
Scientific). Hepatic triglyceride levels were measured using Triglyceride 


Liquid Reagents kit (Pointe Scientific). Levels of serum lipopolysac- 
charide and faecal albumin were determined by enzyme-linked immu- 
nosorbent kits (Lifeome Biolabs and Bethyl Labs, respectively). Serum 
levels of ethanol were measured using ethanol assay kit (BioVision). 


Staining procedures 

Formalin-fixed tissue samples were embedded in paraffin and stained 
withH &E. To determine lipid accumulation, liver sections were embed- 
ded in OCT compound. Eight-micrometre frozen sections were then cut 
and stained with Oil Red O (Sigma-Aldrich). Representative images from 
each group of mice are shown in each figure. The terminal deoxynucleo- 
tide transferase-mediated dUTP nick-end labelling (TUNEL) assay was 
performed using an in situ cell death detection kit (Sigma-Aldrich). We 
randomly selected five high-power fields for counting TUNEL-positive 
cells and normalized numbers to total cells. 


Statistical analysis 

Results are expressed as mean +s.e.m. (except when stated otherwise). 
Univariate and multivariate Cox regression analysis was used to detect 
associations of cytolysin with overall mortality. The multivariate model 
was adjusted for geographical origin of the patients, antibiotic treat- 
ment, platelet count, and creatinine, bilirubin and INR as components 
ofthe MELD score. Univariate logistic regression analysis of laboratory 
and clinical parameters associated with the detection of cytolysin and 
E. faecalis was performed. Univariate linear regression analysis of labo- 
ratory and clinical parameters associated with the log-transformed 
total amount of faecal F. faecalis measured with qPCR was performed. 
To associate log-transformed total F. faecalis and E. faecalis positiv- 
ity with mortality, univariate Cox regression was used. P values from 
univariate and multivariate Cox regression, univariate logistic regres- 
sion and univariate linear regression were determined by Wald test. 
Multicollinearity was examined using the variance inflation factor. 
Kaplan-Meier curves were used to compare survival between cytolysin- 
positive and cytolysin-negative patients with alcoholic hepatitis. Faecal 
E. faecalis, bacterial diversity and richness from controls and patients 
were compared using Kruskal-Wallis test with Dunn’s post hoc test. 
Region- and/or centre-specific clinical characteristics of patients with 
alcoholic hepatitis were compared with Kruskal-Wallis test for continu- 
ous and Fisher’s exact test for categorical variables. Faecal E. faecalis 
in patients with alcoholic hepatitis with or without cytolysin, and with 
or without cirrhosis, were compared with Mann-Whitney-Wilcoxon 
rank-sum test. Faecal £. faecalis in patients with alcoholic hepatitis 
from different region and/or centres were compared with the Kruskal- 
Wallis test. The percentage of subjects with faecal samples that were 
positive for E. faecalis and cytolysin was compared using Fisher’s exact 
test, followed by FDR procedures for multiple group comparisons. 
Jaccard dissimilarity matrices were used for PCoA, and P values were 
determined by PERMANOVA followed by FDR procedures to correct 
for multiple comparisons. 

For mouse and cell culture studies, the significance of multiple 
groups was evaluated using one-way or two-way ANOVA with Tukey’s 
post hoc test. Fisher’s exact test was used in the analysis of liver E. 
faecalis and cytolysin with FDR correction for multiple comparisons. 
Kaplan-Meier curves were used to compare survival between experi- 
mental mouse groups. PCoA based on Jaccard dissimilarity matrices 
was performed between experimental mouse groups and the Pvalues 
were determined by PERMANOVA followed by FDR procedures to cor- 
rect for multiple comparisons. 

Exact Pvalues for all comparisons, together with group size for each 
group, were listed in Supplementary Table 10. Statistical analyses were 
performed using R statistical software, R v.3.5.1 (R Foundation for Sta- 
tistical Computing) and GraphPad Prism v.6.01. A value of P< 0.05 
was considered to be statistically significant (adjusted for multiple 
comparisons when performing multiple tests). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Raw 16S sequence reads can be found in the NCBI SRA associated with 
Bioproject PRJNA525701. Phage raw sequence reads and annotated 
genomes are available at NCBI under the following consecutive BioSa- 
mple identifiers (SAMN11089809-SAMNI11089827). Genome sequence 
data of E. faecalis strains isolated in this study were registered at the 
ENA under study PRJEB25007. 


Code availability 


The PERL script for making the genetic maps of phage genomes can be 
found at https://github.com/JCVenterInstitute/LinearDisplay. 
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Extended Data Fig. 1| Intestinal dysbiosis in patients with alcoholic 
hepatitis. a, 16S rRNA sequencing of faecal samples from controls (n=14), 
patients with alcohol-use disorder (n= 43), or alcoholic hepatitis (n=75). The 
graph demonstrates the relative abundance of sequence reads in each genus. b, 
Bacterial diversity (Shannon index and Simpson index) and richness (Chao 
richness) was calculated in controls (n=14), patients with alcohol-use disorder 
(n=43) or alcoholic hepatitis (n=75).c, F. faecalis in faecal samples from 
controls (n=25), patients with alcohol-use disorder (n = 38) or alcoholic 
hepatitis (n= 82), assessed by qPCR. d, Percentage of faecal samples positive 
for E. faecalis in controls (n=25), patients with alcohol-use disorder (n =38) or 
alcoholic hepatitis (n = 82), assessed by qPCR. E. faecalis was detected in faeces 
from 80% of patients with alcoholic hepatitis, versus 36% of controls 
(P<0.001). There was alsoa significant difference between patients with 
alcohol-use disorder and patients with alcoholic hepatitis (P< 0.01). e, Receiver 
operating characteristic curves and area under the curve (AUC) for the 
comparison of 90-day mortality and cytolysin positivity (red; n=57), MELD 
score (blue; n=56), ABIC score (yellow; n=57) and discriminant function 
(green; n= 42) in patients with alcoholic hepatitis. f, E. faecalis in faecal samples 
from patients with alcoholic hepatitis whose faecal samples were cytolysin- 
positive (n= 25) or cytolysin-negative (n=54), assessed by qPCR (P= 0.8174). 

g, 16S rRNA sequencing of faecal samples from patients with alcoholic hepatitis 
from different centres (France, n=9; Mexico, n= 6; Spain, n=5; UK,n=11;USA 
(east), n=16; USA (Midwest), n=12; USA (west), n=16 patients). We used PCoA 
based onJaccard dissimilarity matrices to show B-diversity among groups at 
the genus level. The composition of faecal microbiota was significantly 
different between patients from different regions (P< 0.01). h, Percentage of 
faecal samples that were positive for cy/L, and cylL; DNA sequences (cytolysin- 
positive), in patients with alcoholic hepatitis from different centres (France, 


n=16; Mexico, n=6; Spain, n= 6; UK, n=10; USA (east), n=16; USA (Midwest), 
n=13; USA (west), n=15 patients), assessed by qPCR (P=0.6094).i, E. faecalisin 
faecal samples from patients with alcoholic hepatitis from different centres, 
assessed by qPCR (P= 0.5648).j, Percentage of faecal samples that were 
positive for F. faecalis in patients with alcoholic hepatitis from different 
centres (France, n=16; Mexico, n= 6; Spain, n= 6; UK, n=10; USA (east), n=16; 
USA (Midwest), n=13; USA (west), n=15 patients), assessed by qPCR 
(P=0.0529).k, Percentage of subjects with faecal samples that were positive 
for cylL, and cy[L; DNA sequences (cytolysin-positive), in patients with 
alcoholic hepatitis and with (n =30) or without (n=18) cirrhosis, assessed by 
qPCR (P=0.3431). I, E. faecalis in faecal samples from patients with alcoholic 
hepatitis and with (n=30) or without (n=18) cirrhosis, assessed by qPCR 
(P=0.5736).m, Percentage of faecal samples that were positive for F. faecalisin 
patients with alcoholic hepatitis and with (n = 30) or without (n=18) cirrhosis, 
assessed by qPCR (P= 0.2878). Results are expressed as mean +5s.e.m. 

(c, f,i, 1). For the box and whisker plots in b, the box extends from the 25th to 
75th percentiles, and the centre line represents the median; for all three 
groups, the bottom whiskers show the minimum values; for the control group 
(black), the top whisker shows the maximum value; for the other two groups, 
the top whiskers represent the 75th percentile plus 1.5x the inter-quartile 
distance (the distance between the 25th and 75th percentiles); all values greater 
than this are plotted as individual dots. Pvalues were determined by Kruskal- 
Wallis test (i) with Dunn’s post hoc test (b, c), two-sided Fisher’s exact test 

(h,j, k, m) followed by FDR procedures (d), two-sided Mann-Whitney Wilcoxon 
rank-sum test (f, I) or PERMANOVA (g). The exact group size (n) and Pvalues for 
each comparison are listed in Supplementary Table 10. *P< 0.05, **P< 0.01, 
**P<(0.001,****P< 0.0001. 
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Extended Data Fig. 2|See next page for caption. 
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Extended Data Fig. 2| Cytolytic E. faecalis causes the progression of 
ethanol-induced liver disease in mice. a-—n, C57BL/6 mice were fed oral 
isocaloric (control) or chronic—binge ethanol diets and gavaged with vehicle 
(PBS), acytolytic E. faecalis strain (FA2-2(pAM714)) (denoted E. faecalis) (5 x 108 
CFUs) or anon-cytolytic E. faecalis strain (FA2-2(pAM771))° (denoted E. faecalis 
Acytolysin) (5 x 108 CFUs) every third day. a, Serum levels of ALT. b, Hepatic 
triglyceride content. c, Representative oil red O-stained liver sections. 

d-f, Hepatic levels of mRNAs. g, Kaplan-Meier curve of survival of mice on 
chronic-binge ethanol diets (day 0 denotes the start of ethanol feeding). Mice 
gavaged with PBS all survived, and are not included in the figure. A higher 
proportion of mice (n=15) gavaged with non-cytolytic E. faecalis survived than 
did mice (n=25) gavaged with cytolytic £. faecalis. h, Proportions of mice that 
were positive for cytolysin in the liver, measured by qPCR for cy/L , (the gene 
that encodes cytolysin subunit CylL,”).i, Proportions of mice that were 
positive for F. faecalis in the liver, measured by qPCR. About 80% of mice 
colonized with cytolytic F. faecalis, as well as those colonized with non- 
cytolytic £. faecalis, were positive for E. faecalis in their livers. j, Liver CFUs of 
Enterococcus in mice onachronic-binge ethanol diet. k, Paracellular intestinal 
permeability was evaluated by measuring faecal albumin content and serum 
levels of lipopolysaccharide (LPS) by enzyme-linked immunosorbent assays. 

I, Faecal samples were collected and 16S rRNA genes were sequenced. PCoA 
based onJaccard dissimilarity matrices showed no significant differences 
among mice gavaged with PBS, cytolytic or non-cytolytic F. faecalis following 


feeding with the control and ethanol diets. Compared to mice fed witha control 
diet, mice fed with an ethanol diet had significantly different faecal 
microbiomes after gavaging with F. faecalis (P< 0.05).m,n, Serum levels of 
ethanol and hepatic levels of Adh1 and Cyp2e1 mRNAs did not differ 
significantly among mice gavaged with PBS, cytolytic or non-cytolytic 

E. faecalis after ethanol feeding. 0, Mice were gavaged with cytolytic or non- 
cytolytic £. faecalis strains (carrying the erythromycin resistance gene; 5 x 108 
CFUs) at time O, and faeces were collected 0, 8, 24,48 and 72h later. Faecal CFUs 
of Enterococcus were determined by culturing faecal samples on BBL 
enterococcosel brothagar plate with 50 pg ml‘ erythromycin. At time O and 
72h, five out of five and four out of five mice, respectively, had no detectable 
erythromycin-resistant Enterococcus in their faeces. These points are not 
shown on the graph, but have been included in the calculation of mean+s.e.m. 
Scale bar, 100 pm. Results are expressed as mean +s.e.m. (a,b, d-f,j,k,m-o). 
Pvalues among groups of mice fed with the control or ethanol diet were 
determined by one-way ANOVA with Tukey’s post hoc test (a, b, d-f,j, k, m,n), 
two-sided log-rank (Mantel-Cox) test (g), two-sided Fisher’s exact test 
followed by FDR procedures (h, i) or PERMANOVA followed by FDR procedures 
(I). All results were generated from at least three independent replicates. The 
exact group size (n) and Pvalues for each comparison are listed in 
Supplementary Table 10. Pvalues between mice fed witha control diet and mice 
fed with an ethanol diet were determined by two-way ANOVA (k).*P< 0.05, 
**P<0.01,***P< 0.001, ****P<0.0001. 
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Extended Data Fig. 3 | Transplantation of cytolysin-positive faeces increases 
ethanol-induced liver disease in gnotobiotic mice. a-f, h, i, CS57BL/6 germ- 
free mice were colonized with faeces from two cytolysin-positive and two 
cytolysin-negative patients with alcoholic hepatitis, and then fed isocaloric 
(control) or chronic—binge ethanol diets. a, Percentage of TUNEL-positive 
hepatic cells. b, Representative oil red O-stained liver sections. c,d, Hepatic 
levels of mRNAs that encode the inflammatory cytokine Cxcl2 and Acta2 (a 
marker of activated hepatic stellate cells). e, Kaplan-Meier curve of survival of 
mice onchronic-binge ethanol diets (day 0 denotes the start of ethanol 
feeding), gavaged with faeces from cytolysin-positive (n=48 mice) or 
cytolysin-negative (n = 32 mice) patients with alcoholic hepatitis. f, Faecal 
samples were collected and 16S rRNA genes were sequenced. The graph shows 
PCoA of faecal microbiomes. No significant difference was observed between 
mice colonized with faeces from cytolysin-positive or cytolysin-negative 
donors with alcoholic hepatitis, following the control diet. Mice transplanted 
with faeces froma cytolysin-positive patient with alcoholic hepatitis (patient 
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no. 2) showed a microbiota that was significantly different to that of the other 
mouse groups following ethanol administration (P< 0.01). g, Percentage of 
cytolysin-positive F. faecalis in four patients with alcoholic hepatitis. Stool 
samples from the four patients were placed on plates with selective medium, 
and Enterococcus colonies were identified by the production of a dark brown or 
black colour. Enterococcus colonies were confirmed to be £. faecalis by qPCR. 
The cytolysin status of each E. faecalis colony was determined by qPCR. 

h, Serum levels of ethanol were comparable among colonized mice after 
ethanol feeding. i, Hepatic levels of Adh1 and Cyp2e1 mRNAs did not differ 
significantly among colonized mice oncontrol or ethanol diets. Scale bar, 

100 pm. Results are expressed as mean +s.e.m. (a,c, d,h, i). Pvalues were 
determined by one-way ANOVA with Tukey’s post hoc test (a, c, d, h, i), two- 
sided log-rank (Mantel-Cox) test (e) or PERMANOVA followed by FDR 
procedures (f). All results were generated from at least three independent 
replicates. The exact group size (n) and Pvalues for each comparison are listed 
in Supplementary Table 10. *P< 0.05, **P< 0.01, ***P< 0.001. 
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Extended Data Fig. 4 | Isolation and amplification of phages against 
cytolytic E. faecalis isolated from mice. a, BHI agar plate showing phage 
plaque morphology. The phage cocktail (100 ul) (107-10? PFUs) was mixed with 
overnight-grown E. faecalis culture (100 pl) and then added to BHI brothtop 
agar (0.5% agar) and poured over a BHI plate (1.5% agar). After overnight growth 
at 37 °C, images were captured onan Epson Perfection 4990 Photo scanner. 

b, Simplified illustration of the morphologies of different phages. Siphophages 
have long, flexible noncontractile tails (left); myophages have contractile tails 


| DNA polymerase | endolysin/lysin 
9) DNA binding 


Hl hypothetical protein 
BS intron/intein 


(middle); and podophages have short noncontractile tails (right). 

c, Transmission electron microscopy revealed that phages we isolated were all 
podophages (Efmus1, Efmus2, Efmus3 and Efmus4). d, Genetic map of phage 
genomes. The linear maps are based on nucleotide sequences of the phage 
genomes and predicted open reading frames. The name and length (in bp) of 
each genome are indicated to the left of each phage map. Protein-coding 
sequences are coloured on the basis of functional role categories. Scale bar, 
50nm. All results were generated from at least three independent replicates. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Phages reduce translocation of cytolysin to the liver 
and reduce ethanol-induced liver disease in Atp4a*™“"' mice. a-k, Wild-type 
(WT) and Atp4a*“littermates were fed oral isocaloric (control) or chronic- 
binge ethanol diets, and gavaged with vehicle (PBS), control phages against 

C. crescentus (10"° PFUs) or acocktail of four different phages that target 
cytolytic E. faecalis (10° PFUs), 1 day before an ethanol binge. a, Serum levels of 
ALT. b, Hepatic triglyceride content. c, Representative oil red O-stained liver 
sections. d-f, Hepatic levels of mRNAs. g, Proportions of mice that were 
positive for cytolysin in the liver, measured by qPCR for cy/L ;.h, Faecal CFUs of 
Enterococcus. i, Faecal samples were collected and 16S rRNA genes were 
sequenced. PCoA based onJaccard dissimilarity matrices found no significant 


difference in faecal microbiota among mice given PBS, control phage or phages 
that target cytolytic £. faecalis in each group.j, k, Serum levels of ethanol and 
hepatic levels of Adh1 and Cyp2e1 mRNAs did not differ significantly among 
colonized mice after ethanol feeding. Scale bar, 100 pm. Results are expressed 
as mean+s.e.m. (a,b, d-f,h,j,k). Pvalues were determined by two-way ANOVA 
with Tukey’s post hoc test (a, b, d-f, h,j, k), two-sided Fisher’s exact test 
followed by FDR procedures (g) or PERMANOVA followed by FDR procedures 
(i). All results were generated from at least three independent replicates. 

The exact group size (n) and Pvalues for each comparison are listed in 
Supplementary Table 10. *P< 0.05, **P< 0.01, ***P< 0.001, ****P< 0.0001. 
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Extended Data Fig. 7 | Phages that target cytolytic EF. faecalis reduce 
ethanol-induced liver disease in gnotobiotic mice. a—h, CS57BL/6 germ-free 
mice were colonized with faeces from two cytolysin-positive patients with 
alcoholic hepatitis (faeces from one patient were also used in Fig. 2). The mice 
were then fed oral isocaloric (control) or chronic-binge ethanol diets, and 
gavaged with control phages against C. crescentus (10° PFUs) or acocktail 

of 3 or 4 different phages that target cytolytic E. faecalis (10'° PFUs), one day 
before an ethanol binge. a, Percentage of TUNEL-positive hepatic cells. 

b, Representative oil red O-stained liver sections. c, d, Hepatic levels of mRNAs 
that encode the inflammatory cytokine Cxcl2, and Acta2 (a marker of activated 
hepatic stellate cells). e, Faecal CFUs of Enterococcus. f, Faecal samples were 
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collected and 16S rRNA genes were sequenced. PCOA based onJaccard 
dissimilarity matrices shows no significant differences in the faecal microbiota 
of mice gavaged with control phage and phages that target cytolytic £. faecalis 
in each group. g,h, Serum levels of ethanol and hepatic levels of Adh1 and 
Cyp2e1 mRNAs did not differ significantly among colonized mice after ethanol 
feeding. Scale bar, 100 pm. Results are expressed as mean+s.e.m. 

(a, c-e, g, h). Pvalues were determined by two-way ANOVA with Tukey’s post 
hoc test (a, c—e, g, h) or PERMANOVA followed by FDR procedures (f). All results 
were generated from at least three independent replicates. The exact group 
size (n) and Pvalues for each comparison are listed in Supplementary Table 10. 
*P<0.05,***P<0.001. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Isolation and amplification of phages against non- 
cytolytic E. faecalis strains isolated from patients with alcoholic hepatitis. 
a, BHI agar plates showing phage plaque morphology. b, Genetic map of phage 
genomes. The linear maps are based on nucleotide sequences of the phage 
genomes and predicted open reading frames. The name and length (in bp) of 
each genome are indicated to the left of each phage map. Protein-coding 
sequences are coloured on the basis of functional role categories. Sequences 
that encodetRNA genes are indicated by acloverleaf structure. c, Phylogenetic 
tree of Enterococcus phages. A whole-genome average nucleotide distance tree 
was constructed for 73 available Enterococcus phage genomes: 54 of these were 


from GenBank (denoted by black letters) and 19 were from this study (4 phages 
against cytolysin-positive F. faecalis isolated from mice (shown in blue letters); 
7 phages against cytolysin-positive F. faecalis isolated from patients with 
alcoholic hepatitis (shown in pink letters); and 8 phages against cytolysin- 
negative F. faecalis isolated from patients with alcoholic hepatitis (shownin 
green letters)) with Mash* using a sketch size of s=5000 anda k-mer size of 
k=12.and GGRaSP*® (Methods). Coloured branches denote specific phage 
genera or subfamily: Sap6virus, P68virus and Spounavirinae. The scale bar 
represents per cent average nucleotide divergence. All results were generated 
from atleast three independent replicates. 
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Extended Data Fig. 9 | Phages that target non-cytolytic E. faecalis donot 
reduce ethanol-induced liver disease in gnotobiotic mice. a-h, C57BL/6 
germ-free mice were colonized with faeces from two cytolysin-negative 
patients with alcoholic hepatitis. Transplanted gnotobiotic mice were fed oral 
isocaloric (control) or chronic—binge ethanol diets and gavaged with control 
phages against C. crescentus (10° PFUs) or a cocktail of four different phages 
targeting non-cytolytic F. faecalis (10° PFUs), 1 day before an ethanol binge. 

a, Percentage of TUNEL-positive hepatic cells. b, Representative oil red 
O-stained liver sections. c, d, Hepatic levels of mRNAs that encode the 
inflammatory cytokine Cxcl2, and Acta2 (a marker of activated hepatic stellate 
cells). e, Proportions of mice that were positive for cytolysin in the liver, 
measured by qPCR for cylL,. f, Faecal samples were collected and 16S rRNA 
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genes were sequenced. PCoA based onJaccard dissimilarity matrices found no 
significant difference in faecal microbiota among mice gavaged with control 
phages and phages that target cytolytic F. faecalis ineach group. g,h,Serum 
levels of ethanol and hepatic levels of Adh1 and Cyp2e1 mRNAs did not differ 
significantly among colonized mice after ethanol feeding. Scale bar, 100 pm. 
Results are expressed as mean +s.e.m. (a,c,d,g,h). Pvalues were determined 
by two-way ANOVA with Tukey’s post hoc test (a,c, d, g, h), two-sided Fisher’s 
exact test followed by FDR procedures (e) or PERMANOVA followed by FDR 
procedures (f). All results were generated from at least three independent 
replicates. The exact group size (n) and Pvalues for each comparison are listed 
in Supplementary Table 10. 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


a The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


— For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


| For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection All biochemical assays were measured using SoftMax Pro 7.0.3; qPCRs were run with StepOnePlus real-time PCR system; Liver histological 
pictures were taken with DP Controller and DP Manager (Olympus); Phage electronic microscopy pictures were taken using Maxim DL5; 
Plates were scanned using EPSON 4990 Photo; All pictures were viewed using ImageJ 


Data analysis Bacteriophage sequencing and phage tree: 

Albacore v2.3.4 (ONT), Porechop v0.2.3, Unicycler v0.4.7 pipeline, Pilon v1.22, CLC Genomics Workbench 4.9, NCBI Prokaryotic Genome 
Annotation Pipeline, in-house PERL script using Xfig, Phage_Finder, MASH program, GGRaSP and APE R-package, iTOL tree viewer 

16S sequencing: 

OTHUR-based 16S rDNA analysis workflow 

E. faecalis genome sequencing and tree: 

abricate vO0.8.10 , Prokka, Roary, RAxML, iTOL 

Statistical analyses: 

R statistical software 3.5.1, GraphPad Prism v6.01 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


Raw 16S sequence reads can be found in the NCBI SRA associated with Bioproject PRJNA525701. Bacteriophage raw sequence reads and annotated genomes are 
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available at NCBI under the following consecutive BioSample IDs (SAMN11089809 — SAMN11089827). Genome sequence data of E. faecalis strains isolated in this 
study were registered at ENA under Study PRJEB25007. 
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Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 
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Sample size No power analyses or other calculations were used to predetermine sample sizes. Sample sizes were chosen based on prior literature using 
similar experimental paradigms (Nat Commun. 2017;8:2137; Gut. 2019;68:1504-1515) 


Data exclusions No data were excluded 


Replication n vivo experiments: more than two technical replicates (from different cohorts, on different dates), as well as biological replicates were 
performed to ensure data reproducibility; 

n vitro experiments: three independent experiments and also replicates were performed on different dates to ensure data reproducibility. 
All replications were successful. 


Randomization ice of similar age and weight were randomly assigned to experimental and control groups. 


Blinding The investigators were not blinded during cell and animal experiment assays. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
|| Antibodies |] ChiP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Female and male C57BL/6 mice (age, 9-12 weeks) (strain: wild type, Atp4asl/sl) 

Wild animals No wild animals were involved in the study. 

Field-collected samples No field-collected samples were involved in the study 

Ethics oversight All animal studies were reviewed and approved by the Institutional Animal Care and Use Committee of the University of 


California, San Diego. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics Alcoholic hepatitis patients were from multiple centers from United States, Mexico and Europe, with the age ranged from 30 to 


Population characteristics 


Recruitment 


Ethics oversight 


75. Alcohol use disorder patients and non-alcoholic controls we 
to 74. Both genders were included in all populations. Detailed d 


Patients with alcohol use disorder fulfilling the DSM IV criteria (J 
program. Alcoholic hepatitis patients were enrolled from the In1 
isted in Methods 


The protocol was approved by the Ethics Committee of Hdpital 


re from United States and Europe, with the age ranged from 27 
escriptions in Methods, Extended Data Tables 1 and 2 


Abnorm Psychol. 1997;106:545-553) were recruited from an 


alcohol withdrawal unit in San Diego, USA and Brussels, Belgium where they followed a detoxification and rehabilitation 


Team Consortium (ClinicalTrials.gov identifier number: 


NCT02075918) from centers in the USA, Mexico, United Kingdom, France and Spain. Detailed inclusion and exclusion criteria are 


Huriez (Lille, France), Universidad Autonoma de Nuevo Leon 


Monterrey, México), Hospital Universitari Vall d'Hebron (Barce 
New Haven, USA), University of North Carolina at Chapel Hill (C 


ona, Spain), King's College London (London, UK), Yale University 
hapel Hill, USA), Weill Cornell Medical College (New York, USA), 


Columbia University (New York, USA), University of Wisconsin (Madison, USA), VA San Diego Healthcare System (San Diego, USA) 


and Université Catholique de Louvain (Brussels, Belgium). 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Liver cirrhosis is a major cause of death worldwide and is characterized by extensive 
fibrosis. There are currently no effective antifibrotic therapies available. To obtaina 
better understanding of the cellular and molecular mechanisms involved in disease 
pathogenesis and enable the discovery of therapeutic targets, here we profile the 
transcriptomes of more than 100,000 single human cells, yielding molecular 
definitions for non-parenchymal cell types that are found in healthy and cirrhotic 
human liver. We identify a scar-associated TREM2*CD9* subpopulation of 
macrophages, which expands in liver fibrosis, differentiates from circulating 
monocytes and is pro-fibrogenic. We also define ACKRI* and PLVAP* endothelial cells 
that expand in cirrhosis, are topographically restricted to the fibrotic niche and 
enhance the transmigration of leucocytes. Multi-lineage modelling of ligand and 
receptor interactions between the scar-associated macrophages, endothelial cells 
and PDGFRa‘’ collagen-producing mesenchymal cells reveals intra-scar activity of 
several pro-fibrogenic pathways including TNFRSF12A, PDGFR and NOTCH signalling. 
Our work dissects unanticipated aspects of the cellular and molecular basis of human 
organ fibrosis at a single-cell level, and provides a conceptual framework for the 
discovery of rational therapeutic targets in liver cirrhosis. 


Recent estimates suggest that 844 million people worldwide have 
chronic liver disease, with two million deaths per year and a rising 
incidence’. Iterative liver injury secondary to any cause leads to pro- 
gressive fibrosis and ultimately results in liver cirrhosis. Notably, the 
degree of liver fibrosis predicts adverse patient outcomes”. Hence, 
effective antifibrotic therapies for patients with chronic liver disease 
are urgently required>*. 

Liver fibrosis involves a complex interplay between multiple non- 
parenchymal cell (NPC) lineages including immune, endothelial and 
mesenchymal cells spatially located within areas of scarring, termed 
the fibrotic niche. Despite progress in our understanding of liver 
fibrogenesis accrued using rodent models, there remains a consider- 
able ‘translational gap’ between putative targets and effective patient 
therapies**. This is in part due to limited definition of the functional 
heterogeneity and interactome of cell lineages that contribute to the 
fibrotic niche of human liver cirrhosis, which is imperfectly recapitu- 
lated by rodent models’. 


Single-cell RNA sequencing (scRNA-seq) is delivering a step change 
in our understanding of disease pathogenesis, allowing the interro- 
gation of individual cell populations at unprecedented resolution’. 
Here, we studied the mechanisms that regulate human liver fibrosis 
using scRNA-seq. 


Single-cell atlas of human liver NPCs 


Hepatic NPCs were isolated from healthy and cirrhotic human livers 
spanning a range of aetiologies of cirrhosis (Fig. la, Extended Data 
Fig. la). Leucocytes (CD45°) or other NPC (CD45 ) fractions (Extended 
Data Fig. 1b) were sorted by flow cytometry before scRNA-seq analysis. 
To discriminate between liver-resident and circulating leucocytes, we 
also performed scRNA-seq on CD45*CD66b_ peripheral blood mono- 
nuclear cells (PBMCs) (Extended Data Fig. Ic, g-i). The combined tissue 
and PBMC dataset was partitioned into clusters (Extended Data Fig. 1d) 
and annotated using signatures of known lineage markers (Extended 
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Fig. 1| Single-cell atlas of human liver NPCs. a, Overview, illustrating the 
isolation, FACS sorting and scRNA-seq analysis of leucocytes (CD45") and other 
NPC fractions (CD45 ).b, Clustering of 66,135 cells from healthy (n=5) and 
cirrhotic (n=5) human livers. c, Annotation by injury condition. d, Cell lineage 
inferred from expression of marker gene signatures. ILC, innate lymphoid cell; 
MP, mononuclear phagocyte; pDC, plasmacytoid dendritic cell.e, Heat map of 
cluster marker genes (top, colour-coded by cluster and condition), with cell lineage 
of exemplar genes labelled (right). Columns denote cells; rows denote genes. 


Data Fig. 1d, e, Supplementary Table 1). To generate an atlas of liver- 
resident cells, contaminating circulating cells were removed from 
the liver tissue datasets, by excluding cells from the tissue samples 
which mapped transcriptionally to blood-derived clusters 1 and 13 
(Extended Data Fig. 1d). Liver-resident cells expressed higher levels 
of tissue-residency markers such as CXCR4 compared with PBMCs 
(Extended Data Fig. If). 

Re-clustering the 66,135 liver-resident cells from 10 livers (n=5S healthy 
and n=S cirrhotic) revealed 21 populations (Fig. 1b), each containing 
cells from both healthy and cirrhotic livers (Fig. 1c, Extended Data Fig. 2), 
across 10 cell lineages (Fig. 1d, Extended Data Fig. 2a, b). Subpopulation 
markers were identified across all clusters and lineages (Fig. le, Supple- 
mentary Tables 3, 4). Quality control metrics were highly reproducible 
between individual samples and conditions (Extended Data Fig. 2c-f, 
Supplementary Table 2). Expression of collagens typeI and III, the main 
fibrillar collagens within the fibrotic niche, was restricted to cells of the 
mesenchymal lineage (Fig. le). 

We proceeded to annotate all human liver NPC lineages (below, 
Supplementary Notes 1-3, Extended Data Fig. 3), and provide an 
open-access gene browser (http://www. livercellatlas.mvm.ed.ac.uk) 


that allows assessment of NPC gene expression between healthy and 
cirrhotic livers. 


Distinct macrophages inhabit the fibrotic niche 


Previous studies in rodents have highlighted macrophage subpopu- 
lations that orchestrate both the progression and regression of liver 
fibrosis® ®. Clustering of mononuclear phagocytes (MPs) identified ten 
clusters; annotated as scar-associated macrophages (SAMacs), Kupffer 
cells (KCs), tissue monocytes (TMs), conventional dendritic cells (cDCs) 
and cycling (proliferating) cells (Fig. 2a, Extended Data Fig. 4a, Sup- 
plementary Note 2). Clusters MP(4) and MP(5)—named SAMac(1) and 
SAMac(2), respectively—were expanded in cirrhotic livers (Fig. 2b), as 
confirmed by quantification of the MP cell composition of each liver 
individually (Fig. 2c). 

Clusters MP(6) and MP(7) were enriched in the expression of CD163, 
MARCO and TIMD4 (Extended Data Fig. 4b); tissue staining confirmed 
these as KCs (resident liver macrophages), facilitating the annota- 
tion of these clusters as KC(1) and KC(2), respectively (Extended Data 
Fig. 4c). A lack of TIMD4 expression distinguished cluster KC(2) from 
KC(1) (Extended Data Fig. 4b); cell counting demonstrated TIMD4' cell 
numbers to be equivalent between healthy and cirrhotic livers, but 
showed a loss of MARCO* cells, consistent witha selective reductionin 
MARCO‘*TIMD4 KC(2) in liver fibrosis (Fig. 2c, Extended Data Fig. 4d, e). 

Scar-associated clusters SAMac(1) and SAMac(2) expressed the unique 
markers TREM2 and CD9 (Fig. 2d, e). These macrophages displayed 
a hybrid phenotype, with features of both TMs and KCs (Fig. 2d, e), 
analogous to monocyte-derived macrophages in mouse liver injury 
models’. Flow cytometry confirmed expansion of TREM2*CD9 mac- 
rophages in human fibrotic livers (Fig. 2f, Extended Data Fig. 4f). 
Conditioned medium from SAMacs after fluorescence-activated cell 
sorting (FACS) promoted fibrillar collagen expression by primary human 
hepatic stellate cells (HSCs) (Fig. 2g), indicating that SAMacs have a 
pro-fibrogenic phenotype. Tissue staining demonstrated the presence 
of TREM2*CD9*MNDA‘ SAMacs topographically localized in collagen- 
positive scar regions (Fig. 2h, Extended Data Fig. 4g-i), and significantly 
expanded in cirrhotic livers (Extended Data Fig. 4j, k). Cell counting 
of stained cirrhotic livers morphologically segmented into regions of 
fibrotic septae and parenchymal nodules, confirmed SAMac accumula- 
tion within the fibrotic niche (Extended Data Fig. 41). 

Local proliferation has a significant role in macrophage expansion at 
sites of fibrosis in rodent models”’. Cycling MP cells (Fig. 2a) subclus- 
tered into subpopulations of conventional dendritic cells (cDC1 and 
cDC2),KCs and SAMacs (Extended Data Fig. 4m, Supplementary Table 8). 
Cycling SAMacs expanded in cirrhosis (Extended Data Fig. 4m), which 
highlights the potential role of macrophage proliferation in promoting 
SAMac accumulation in the fibrotic niche. 


Pro-fibrogenic phenotype of SAMacs 
To delineate the functional profile of SAMacs, we visualized co-ordi- 
nately expressed gene groups across the MP subpopulations using self- 
organizing maps (Extended Data Fig. 5a). We identified six optimally 
differentiating metagene signatures, denoted as A-F (Extended Data 
Fig. 5a, Supplementary Table 9). Signatures A and B defined SAMacs and 
were enriched for ontology terms relevant to tissue fibrosis (Extended 
Data Fig. 5b). These SAMac-defining signatures included genes suchas 
TREM2, IL1B, SPP1, LGALS3, CCR2 and TNFSF12, some of whichare known 
to regulate the function of scar-producing myofibroblasts in fibrotic 
liver diseases’. The remaining MP subpopulations were defined by 
signature C (KCs), signatures D, E(TMs) and signature F (cDC1); ontology 
terms matched known functions for the associated cell type (Extended 
Data Fig. 5b, Supplementary Table 9). 

In mice, under homeostatic conditions, embryologically derived 
self-renewing tissue-resident KCs predominate” !°. However, after 
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Fig. 2 | Identifying SAMac subpopulations. a, Clustering of 10,737 MPs from 
healthy (n=5) and cirrhotic (n=5) human livers. b, Annotation by injury 
condition. c, Fractions of MP subpopulations in healthy (n=5) and cirrhotic 
(n=5) livers. d, Heat map of MP cluster marker genes (top, colour-coded by 
cluster and condition), with exemplar genes labelled (right). Columns denote 
cells; rows denote genes. e, Scaled gene expression of SAMac and TM cluster 
markers across MP cells from healthy (n=5) and cirrhotic (n=5) livers. f, Flow 
cytometry analysis of TREM2°CD9 MP fraction in healthy (n= 2) and cirrhotic 


injury, macrophages derived from circulating monocytes accumulate 
in the liver and regulate fibrosis”*. The ontogeny of human hepatic 
macrophage subpopulations is unknown. TREM2*CD9* SAMacs demon- 
strated amonocyte-like morphology (Fig. 2h, Extended Data Fig. 4g-i) 
and a distinct topographical distribution from KCs (Extended Data 
Fig. 41). To assess the origin of SAMacs, we performed in silico trajec- 
tory analysis on a combined dataset of peripheral blood monocytes 
and liver-resident MPs. We visualized the transcriptional profile of 
these cells (Fig. 3a, Extended Data Fig. 5c), mapped them along a pseu- 
dotemporal trajectory and interrogated their directionality via spliced 
and unspliced mRNA ratios (RNA velocity”). These analyses suggested 
a differentiation trajectory from peripheral blood monocytes into 
either SAMacs or cDCs, with no differentiation from KCs to SAMacs, 
and no progression from SAMacs to KCs (Fig. 3a, Extended Data Fig. 5c). 
Additional RNA velocity analyses” showed downregulation (negative 
velocity) of the monocyte gene MNDA in SAMacs, upregulation (posi- 
tive velocity) of the SAMac marker gene CD9 in TMs, and a lack of KC 
gene TIMD4 velocity in SAMacs (Extended Data Fig. 5d). Furthermore, 
assessment of the probabilities of cells in this dataset transitioning 
into SAMacs indicated a higher likelihood of TMs than KCs differenti- 
ating into SAMacs (Fig. 3b). Overall, these data suggest that SAMacs 
are monocyte-derived, and represent a terminally differentiated cell 
state within the fibrotic niche. 

To characterize the SAMac phenotype further, we identified dif- 
ferentially expressed genes along monocyte differentiation trajec- 
tories. We defined three gene co-expression modules, with module 
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(n=3) liver. g, Primary human HSCs were treated with conditioned medium 
from SAMacs (n=3) or TMs (n=3), and indicated genes were analysed by 
quantitative PCR (qPCR). Expression is shown relative to mean expression of 
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(n>3) of TREM2 (red), CD9 (white), collagen 1 (green) and DAPI (blue) in 
cirrhotic liver. All scale bars, 50 pm. Data are mean + s.e.m. Pvalues determined 
by Wald test (c) or Kruskal-Wallis and Dunn tests (g). 


lrepresenting upregulated genes during blood monocyte-to-SAMac 
differentiation (Fig. 3c). Module 1 contained multiple pro-fibrogenic 
genes including SPP1, LGALS3, CCL2, CXCL8, PDGFB and VEGFA ®, and 
displayed ontology terms that are consistent with the promotion of 
tissue fibrosis and angiogenesis (Fig. 3c, d, Supplementary Table 10). 
Module 2 contained genes that were downregulated during the dif- 
ferentiation of monocytes to SAMacs (Fig. 3c, Extended Data Fig. 5e), 
whereas module 3 encompassed a group of upregulated genes during 
the differentiation from monocytes to cDCs (Fig. 3c, Extended Data 
Fig. 5f, Supplementary Table 10). SAMacs isolated from cirrhotic human 
livers (Fig. 2f, Extended Data Fig. 4f) demonstrated enhanced protein 
secretion of several of the mediators identified by transcriptional analy- 
sis (Extended Data Fig. 5g) and promoted fibrillar collagen expression 
by primary human HSCs (Fig. 2g), which confirms that SAMacs havea 
pro-fibrogenic phenotype. 

To enable cross-species comparison, we performed scRNA-seq on 
liver MP cells isolated from control mice or mice treated with chronic 
carbon tetrachloride (CCI,)—a mouse model of liver fibrosis’. MP cells 
from fibrotic livers were isolated 24 h after the final CCI, injection, atime 
of active fibrogenesis’. Five MP cell clusters were defined (Extended 
Data Fig. 6a—d, Supplementary Table 11), and injury-specific cluster 
mMP(2) was differentiated by high expression of Cd9, Trem2, Spp1and 
Lgals3 (Extended Data Fig. 6a—d). We confirmed expansion of this CD9* 
mSAMac population in liver fibrosis (Extended Data Fig. 6e, f) and co- 
culture of mSAMacs with quiescent primary mouse HSCs promoted 
fibrillar collagen expression in HSCs (Extended Data Fig. 6g). Canonical 
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Fig. 3 | Pro-fibrogenic phenotype of SAMacs. a, Uniform manifold 
approximation and projection (UMAP) visualization of 23,075 cells from liver- 
resident MPs (healthy, n=5; cirrhotic, n=5) and blood monocytes (PBMCs, n=5), 
annotating monocle pseudotemporal dynamics (purple to yellow). RNA velocity 
field (red arrows) visualized using Gaussian smoothing on regular grid. Right, 
annotation of MP subpopulations and injury condition. b, Transition probabilities 
per SAMac subpopulation, indicating for each cell the likelihood of transition into 
either SAMac(1) or SAMac(2), calculated using RNA velocity (yellow, high; purple, 


correlation analysis between human and mouse MP datasets demon- 
strated that human and mouse SAMacs clustered together (Extended 
Data Fig. 6h, i) and that this cluster was enriched for SAMac markers 
CD9, TREM2 and SPP1 (Extended Data Fig. 6j), confirming that mouse 
SAMacs represent a corollary population to human SAMacs. 

To identify potential transcriptional regulators of human SAMacs, 
we defined sets of genes co-expressed with known transcription factors 
(regulons) along the tissue monocyte-to-macrophage pseudotemporal 
trajectory and inKCs (Extended Data Fig. 5g,h, Supplementary Table 12). 
This identified regulons and corresponding transcription factors asso- 
ciated with distinct macrophage phenotypes, highlighting HES7 and 
EGR2 activity in SAMacs. 

To determine whether SAMacs expand in earlier-stage human liver 
disease, we analysed cohorts of patients with non-alcoholic fatty liver 
disease (NAFLD). Application of differential gene expression signatures 
of human SAMacs, KCs and TMs to a deconvolution algorithm” ena- 
bled the assessment of hepatic monocyte-macrophage composition in 
whole liver microarray data across the spectrum of early-stage NAFLD” 
(Extended Data Fig. 7a). This demonstrated expansion of SAMacs in 
patients with non-alcoholic steatohepatitis (NASH) (Extended Data 
Fig. 7a, b), anincreased frequency of SAMacs with worsening histological 
NAFLD activity score (NAS) and fibrosis score (Extended Data Fig. 7c), 
but no association with other patient demographics (Extended Data 
Fig. 7d). In aseparate NAFLD biopsy cohort, the expansion of SAMacs 
increased with NAFLD activity (Extended Data Fig. 7e) and positively 
correlated with the degree of fibrosis across the full severity spectrum 
of NAFLD-induced liver fibrosis (Extended Data Fig. 7f). 

Insummary, these data demonstrate that TREM2*CD9* SAMacs derive 
from the recruitment and differentiation of circulating monocytes, 
are conserved across species, display a pro-fibrogenic phenotype and 
expand early in the course of liver disease progression. 


Endothelial subpopulations inhabit the fibrotic niche 

Inrodent models, hepatic endothelial cells are known to regulate fibro- 
genesis. Clustering of human liver endothelial cells identified seven 
subpopulations (Fig. 4a). Classical endothelial cell markers did not 
discriminate between the seven clusters, although Endo(1) was dis- 
tinct in lacking CD34 expression (Extended Data Fig. 8a). To annotate 
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low; grey, below threshold of 2x10). c, Heat map with spline curves fitted to 
genes differentially expressed across blood monocyte-to-SAMac (right arrow) 
and blood monocyte-to-cDC (left arrow) pseudotemporal trajectories, grouped 
by hierarchical clustering (k=3).Gene co-expression modules (colour) and 
exemplar genes from each moduleare labelled (right). d, Spline curve fitted to 
averaged expression of all genes in module 1along the monocyte-to-SAMac 
pseudotemporal trajectory (left), with selected enrichment of Gene Ontology 
terms (right). Pvalues determined by Fisher’s exact test. 


endothelial subpopulations fully (Supplementary Note 3, Extended 
Data Fig. 8k), we identified differentially expressed markers (Fig. 4c, 
Supplementary Table 13), determined functional expression profiles 
(Extended Data Fig. 8g, Supplementary Table 14), performed analysis 
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Fig. 4 | Identifying scar-associated endothelial subpopulations. a, Clustering 
8,020 endothelial cells from healthy (n= 4) and cirrhotic (n=3) humanlivers, 
annotating injury condition (right). b, Fractions of endothelial subpopulations 
in healthy (n=4) and cirrhotic (n=3) livers.c, Heat map of endothelial cluster 
marker genes (colour-coded by cluster and condition), with exemplar genes 
labelled (right). Columns denote cells; rows denote genes. d, Representative 
immunofluorescence images (n> 3) of CD34 (red), CLEC4M (white), PLVAP (green) 
and DAPI (blue) inhealthy and cirrhotic human liver. e, Digital pixel quantification 
of CLEC4M staining in healthy (n=5) and cirrhotic (n=8) liver, PLVAP staining in 
healthy (n=11) and cirrhotic (n=11) liver, and ACKR1 staining in healthy (n=10) 
and cirrhotic (n=10) liver. All scale bars, 50 pm. Dataare mean +s.e.m.Pvalues 
determined by Wald test (b) or two-tailed Mann-Whitney test (e). 
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of transcription factor regulons (Extended Data Fig. 8h, Supplementary 
Table 15) and assessed spatial distribution via tissue staining (Fig. 4d, 
Extended Data Fig. 8)). 

Disease-specific endothelial cells Endo(6) and Endo(7) 
(CD34*PLVAP*VWAI' and CD34*PLVAP*ACKRI', respectively; Fig. 4a-c, 
Extended Data Fig. 8b) expanded in cirrhotic liver tissue (Fig. 4e) and 
were restricted to the fibrotic niche (Fig. 4d, e, Extended Data Fig. 8c), 
allowing annotation as scar-associated endothelia SAEndo(1) and 
SAEndo(2), respectively. By contrast, CD34 CLEC4M* Endo(1) (anno- 
tated as liver sinusoidal endothelial cells), were reduced in cirrhotic liv- 
ers (Fig. 4b, e). Metagene signature analysis demonstrated that Endo(6) 
(SAEndo(1)) cells expressed pro-fibrogenic genes including PDGFD, 
PDGFB, LOX and LOXL2; associated ontology terms included extracel- 
lular matrix organization (signature A; Extended Data Fig. 8g). Endo(7) 
(SAEndo(2)) cells displayed animmunomodulatory phenotype (signa- 
ture B; Extended Data Fig. 8g). The most discriminatory marker for this 
cluster, ACKRI, has a rolein regulating leucocyte recruitment”. We con- 
firmed increased expression of PLVAP, CD34 and ACKR1 on endothelial 
cells isolated from cirrhotic livers (Extended Data Fig. 8d). Flow-based 
adhesion assays” demonstrated that cirrhotic endothelial cells display 
enhanced leucocyte transmigration (Extended Data Fig. 8e), which was 
attenuated by ACKR1 knockdown (Extended Data Fig. 8f). 


PDGFRA expression defines SAMes cells 


Clustering of human liver mesenchymal cells identified four popu- 
lations (Fig. 5a, b, Extended Data Fig. 9a, Supplementary Table 16). 
Cluster Mes(1), distinguished by MYH11 expression (Fig. 5b, Extended 
Data Fig. 9a), was identified as vascular smooth muscle cells (VSMCs) 
(Fig. 5c). Mes(4) demonstrated expression of mesothelial markers 
(Fig. 5b, Extended Data Fig. 9a). Cluster Mes(2) expressed high 
levels of RGSS (Fig. 5b, Extended Data Fig. 9a), and RGSS staining 
identified this population as HSCs (Fig. 5c). RGS5* cells were absent 
from the fibrotic niche (Fig. 5c). Cluster Mes(3) (distinguished by 
PDGFRA expression) expressed high levels of fibrillar collagens and 
pro-fibrogenic genes (Fig. 5b, d, Extended Data Fig. 9a). PDGFRa*® 
cells expanded in cirrhotic livers (Fig. 5a, e, f) and were mapped to 
the fibrotic niche (Fig. 5f), enabling annotation as scar-associated 
mesenchymal (SAMes) cells. 

To study SAMes cell heterogeneity, further clustering (Extended 
Data Fig. 9b) identified two populations of SAMes cells (Extended Data 
Fig. 9c,d, Supplementary Table 17). OSR1 expression distinguished clus- 
ter SAMesB (Extended Data Fig. 9c), and labelled a subpopulation of 
periportal cells in healthy liver and scar-associated cells in the fibrotic 
niche (Extended Data Fig. 9e, f). Cluster SAMesB also expressed other 
known portal fibroblast markers” (Extended Data Fig. 9g). 

Inrodentliver fibrosis models, HSCs differentiate into scar-producing 
myofibroblasts °. Pseudotemporal ordering and RNA velocity analyses 
demonstrated atrajectory from human HSCs to SAMes cells (Extended 
Data Fig. 9h). Assessment of gene co-expression modules along the 
HSC-to-SAMes differentiation continuum indicated upregulation of 
fibrogenic genes including COLIAI, COLIA2,COL3A1 and TIMP1 and 
downregulation of genes including RGSS, |GFBP5, ADAMTS1 and GEM, 
which are known to be downregulated in mouse HSC in response to liver 
injury” (Extended Data Fig. 9i). 


The multi-lineage interactome in the fibrotic niche 


Having defined the populations of scar-associated macrophages, 
endothelial and mesenchymal cells, we confirmed the close topographi- 
cal association of these cells within the fibrotic niche (Extended Data 
Fig. 10a, b), and used CellPhoneDB” to perform an unbiased ligand- 
receptor interaction analysis between these populations. 

Numerous statistically significant paracrine and autocrine interac- 
tions were detected between ligands and cognate receptors expressed 
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Fig. 5 | Identifying aSAMes subpopulation. a, Clustering of 2,318 
mesenchymal cells from healthy (n= 4) and cirrhotic (n=3) humanlivers, 
annotating injury condition (right). b, Heat map of mesenchymal cluster 
marker genes (top, colour-coded by cluster and condition), with exemplar 
genes labelled (right). Columns denote cells; rows denote genes. c, 
Representative immunofluorescence images (n > 3) of RGSS (red), MYH11 
(white), PDGFRa (green) and DAPI (blue) in healthy and cirrhotic human liver. d, 
Scaled gene expression of fibrillar collagens across mesenchymal cells from 
healthy (n=4) and cirrhotic (n =3) livers. e, Fraction of mesenchymal 
subpopulations in healthy (n=4) and cirrhotic (n=3) livers. f, PDGFRa 
immunohistochemistry (left) and digital pixel quantification (right) in healthy 
(n=11) and cirrhotic (n =11) livers (top) and in fibrotic septae and parenchymal 
nodules in cirrhotic livers (n=11; bottom). All scale bars, 50 um. Data are 

mean +s.e.m. Pvalues determined by Wald test (e), two-tailed Mann-Whitney 
test (f, top), or two-tailed Wilcoxon test (f, bottom). 


by SAMac, SAEndo and SAMes cells (Supplementary Table 18, Extended 
Data Fig. 10f-m). To interrogate how scar-associated NPCs regulate 
fibrosis and to identify tractable therapeutic targets, we focused func- 
tional analyses on interactions with SAMes (Fig. 6a, e, Extended Data 
Fig. 10d). In keeping with our data demonstrating that SAMacs pro- 
mote fibrillar collagen expression in HSCs (Fig. 2g), SAMacs expressed 
epidermal growth factor receptor (EGFR) ligands that are known to 
regulate mesenchymal cell activation’’ (Fig. 6a). In addition, SAMacs 
expressed the mesenchymal cell mitogens TNFSF12 and PDGFB, signal- 
ling to cognate receptors TNFRSF12A and PDGFRA on SAMes (Fig. 6a). 
We confirmed localization of these ligand-receptor pairs within the 
fibrotic niche (Fig. 6b). Both TNFSF12 and PDGF-BB induced prolif- 
eration of primary human HSCs, which was inhibited by blockade of 
TNFRSF12A and PDGFRA, respectively (Fig. 6c, d). Conditioned medium 
from primary human SAMacs promoted primary human HSC prolifera- 
tion ex vivo (Extended Data Fig. 10c), demonstrating a functional role 
for SAMacs in regulating SAMes cell expansion. 

SAEndo cells expressed high levels of Notch ligands JAGI, JAG2 and 
DLL4 interacting with Notch receptor NOTCH3 on SAMes cells (Fig. 6e). 
NOTCH3 was identified on PDGFRa*® SAMes cells within the fibrotic 
niche (Fig. 6f), and primary endothelial cells from cirrhotic human liver 
demonstrated increased expression of JAGI (Fig. 6g). Co-culture of pri- 
mary human HSCs and endothelial cells from cirrhotic livers promoted 
fibrillar collagen production by HSCs, which was inhibited by addition 
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Fig. 6 | Multi-lineage interactions in the fibrotic niche. a, Dot plot of ligand- 
receptor interactions between SAMac (n=10 human livers) and SAMes (n=7 
human livers) subpopulations. Ligand (red) and cognate receptor (blue) are 
shown on thex axis; cell populations that express ligand (red) and receptor 
(blue) are shown on the yaxis. Circle size denotes P value (permutation test); 
colour (red, high; yellow, low) denotes average ligand and receptor 

expression levels in interacting subpopulations. b, Left, representative 
immunofluorescence image (n > 3) of TNFRSF12A (red), TNFSF12 (white), 
PDGFRa (green) and DAPI (blue); arrows denote TNFRSF12A*PDGFRa‘ cells. 
Right, representative immunofluorescence image (n> 3) of TREM2 (red), PDGFB 
(white), PDGFRa (green) and DAPI (blue); arrows denote TREM2*PDGFB‘ cells. 
Scale bars, 50 um.c, d, HSC proliferation assay. The area under the curve (AUC) 
of the percentage change in HSC number over time (hours) isshown onthey 
axis. n=3 (all conditions inc and d).e, Dot plot of ligand-receptor interactions 
between SAEndo (n=7 human livers) and SAMes (n=7 human livers) 
subpopulations as ina. f, Representative immunofluorescence image (n> 3) of 
NOTCH3 (red), DLL4 (white), PDGFRa (green) and DAPI (blue) in the fibrotic 
niche; arrows denote NOTCH3*PDGFRa' cells. Scale bar, 50 pm. g, Flow 
cytometry analysis of JAG1in endothelial cells from healthy (n= 3) or cirrhotic 
(n=9) liver. Left, representative histogram; right, mean fluorescence intensity 
(MFI). h, Co-culture of primary human HSCs and endothelial cells from cirrhotic 
livers, with or without the Notch inhibitor dibenzazepine (DBZ). Left, 
representative immunofluorescence images (n =3) of collagen1(COL1; 
magenta), PECAMI1 (green) and DAPI (blue). Scale bars, 50 um. Right, digital pixel 
analysis of the collagen 1 area (n= 3). i, Gene knockdown in HSCs using control 
(n=7) or NOTCH3 (n=7) siRNA. Indicated genes were analysed by qPCR, with 
expression relative to mean expression of control siRNA-treated HSCs. Dataare 
mean +s.e.m. Pvalues determined by one-way ANOVA and Tukey test (c, d), two- 
tailed Mann-Whitney test (g, i) or repeated-measures one-way ANOVA and 
Tukey test (h). 


of the Notch-signalling inhibitor dibenzazepine (Fig. 6h). Furthermore, 
knockdown of NOTCH3 expression in primary human HSCs resulted in 
reduced fibrillar collagen expression (Fig. 6i). 

Insummary, our unbiased dissection of the key ligand-receptor inter- 
actions between scar-associated NPCs highlights TNFRSF12A, PDGFRA 
and Notch signalling as important regulators of mesenchymal cell func- 
tion within the human liver fibrotic niche. 


Discussion 


Here, using scRNA-seq and spatial mapping, we resolve the fibrotic 
niche of human liver cirrhosis, identifying pathogenic subpopulations 
of TREM2*CD9* macrophages, ACKRI’ and PLVAP* endothelial cells and 
PDGFRa’ collagen-producing myofibroblasts. We dissect a complex, pro- 
fibrotic interactome between multiple scar-associated cell lineages and 
identify highly relevantintra-scar pathways that are potentially druggable. 
Inthis era of precision medicine, this unbiased multi-lineage approach 
should inform the design of highly targeted combination therapies that 
will very likely be necessary to achieve effective antifibrotic potency** 

Application of our novel scar-associated cell markers could also poten- 
tially inform molecular pathology-based patient stratification, whichis 
fundamental to the prosecution of successful antifibrotic clinical trials. 
Our work illustrates the power of single-cell transcriptomics to decode 
the cellular and molecular basis of human organ fibrosis, providing a 
conceptual framework for the discovery of relevant therapeutic targets 
to treat patients with a broad range of fibrotic diseases. 
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METHODS 


Study subjects 

Local approval for procuring human liver tissue and blood samples 
for scRNA-seq, flow cytometry and histological analysis was obtained 
from the NRS BioResource and Tissue Governance Unit (study number 
SR574), following review at the East of Scotland Research Ethics Ser- 
vice (reference 15/ES/0094). All subjects provided written informed 
consent. Healthy background non-lesional liver tissue was obtained 
intraoperatively from patients undergoing surgical liver resection for 
solitary colorectal metastasis at the Hepatobiliary and Pancreatic Unit, 
Department of Clinical Surgery, Royal Infirmary of Edinburgh. Patients 
with a known history of chronic liver disease, abnormal liver function 
tests or those who had received systemic chemotherapy within the last 
four months were excluded from this cohort. Cirrhotic liver tissue was 
obtained intraoperatively from patients undergoing orthotopic liver 
transplantation at the Scottish Liver Transplant Unit, Royal Infirmary 
of Edinburgh. Blood from patients with a confirmed diagnosis of liver 
cirrhosis were obtained from patients attending the Scottish Liver Trans- 
plant Unit, Royal Infirmary of Edinburgh. Patients with liver cirrhosis due 
to viral hepatitis were excluded from the study. Patient demographics 
are summarized in Extended Data Fig. la. Isolation of primary hepatic 
macrophage subpopulations and endothelial cells from healthy and 
cirrhotic livers for cell culture and analysis of secreted mediators was 
performed at the University of Birmingham, UK. Local ethical approval 
was obtained (reference 06/Q2708/11, 06/Q2702/61) and all patients 
provided written, informed consent. Liver tissue was acquired from 
explanted diseased livers from patients undergoing orthotopic liver 
transplantation, resected liver specimens or donor livers rejected for 
transplant atthe Queen Elizabeth Hospital, Birmingham. For histological 
assessment of NAFLD biopsies, anonymized unstained formalin-fixed 
paraffin-embedded liver biopsy sections encompassing the complete 
NAFLD spectrum were provided by the Lothian NRS Human Annotated 
Bioresource under authority from the East of Scotland Research Ethics 
Service REC 1, reference 15/ES/0094. 


Human tissue processing 

Importantly, to minimize artefacts”, we developeda rapid tissue pro- 
cessing pipeline, obtaining fresh non-ischaemic liver tissue taken by 
wedge biopsy before the interruption of the hepatic vascular inflow 
during liver surgery or transplantation, and immediately processing 
this for FACS. This enabled a workflow time of under three hours from 
patient to single-cell droplet encapsulation. 

For human liver scRNA-seq and flow cytometry analyses, a wedge 
biopsy of non-ischaemic fresh liver tissue (2-3 g) was obtained by the 
operating surgeon. This was immediately placed in HBSS (Gibco) on 
ice. The tissue was then transported directly to the laboratory and dis- 
sociation routinely commenced within 20 min of the liver biopsy. To 
enable paired histological assessment, a segment of each liver speci- 
men was also fixed in 4% neutral-buffered formalin for 24 h followed by 
paraffin-embedding. Additional liver samples, obtained via the same 
method, were fixed in an identical manner and used for further histo- 
logical analysis. For human macrophage cell sorting and endothelial cell 
isolation, liver tissue (40 g) was used from cirrhotic patients undergoing 
orthotopic liver transplantation or control samples from donor liver or 
liver resection specimens. 


Mice 

Adult male C57BL/6JCrl mice aged 8-10 weeks were purchased from 
Charles River. Mice were housed under specific pathogen-free conditions 
at the University of Edinburgh. All experimental protocols were approved 
bythe University of Edinburgh Animal Welfare and Ethics Boardinaccord- 
ance with UK Home Office regulations. Liver fibrosis was induced with 4 
weeks (nine injections) of twice-weekly intraperitoneal CCl, at a dose of 
0.4 pl g ‘body weight, diluted 1:3 in olive oil as previously described’. Mice 


were randomly assigned to receive CCI, or to serve as healthy controls. 
Nosample size calculation or blinding was performed. Liver tissue was 
obtained 24 hafter the final CCI, injection, atime of active fibrogenesis’. 
Comparison was made to age-matched uninjured mice. 


Preparation of single-cell suspensions 

For human liver scRNA-seq, liver tissue was minced with scissors and 
digested in 5 mg mI pronase (Sigma-Aldrich, P5147-5G), 2.93 mg mI? 
collagenase B (Roche, 11088815001) and 0.019 mg mI"! DNase (Roche, 
10104159001) at 37 °C for 30 min with agitation (200-250 r.p.m.), then 
strained through a 120-ym nybolt mesh along with PEB buffer (PBS, 0.1% 
BSA and 2mM EDTA) including DNase (0.019 mg mI’). Thereafter, all 
processing was doneat4 °C. The cell suspension was centrifuged at 400g 
for 7 min, supernatant removed, cell pellet resuspended in PEB buffer 
and DNase added (0.019 mg mI), followed by additional centrifugation 
(400g, 7 min). Red blood cell lysis was performed (BioLegend, 420301), 
followed by centrifugation (400g, 7 min), resuspension in PEB buffer 
and straining through a35-um filter. Following another centrifugation at 
400g for 7 min, cells were blocked in 10% human serum (Sigma-Aldrich, 
H4522) for 10 min at 4 °C before antibody staining. 

For human liver macrophage flow cytometry analysis and cell sorting, 
and for mouse liver macrophage flowcytometry, cell sorting and scRNA- 
seq, single-cell suspensions were prepared as previously described, with 
minor modifications”. In brief, liver tissue was minced and digested in 
an enzyme cocktail 0.625 mg ml‘ collagenase D (Roche, 11088882001), 
0.85 mg mI‘ collagenase V (Sigma-Aldrich, C9263-1G), 1 mg ml“ dispase 
(Gibco, Invitrogen, 17105-041) and30 UmI DNase (Roche, 10104159001) 
in RPMI-1640 at 37 °C for 20 min (mouse) or 45 min (human) with agi- 
tation (200-250 r.p.m.), before being passed through a100-m filter. 
After lysis of red blood cells (BioLegend, 420301), cells were washed in 
PEB buffer and passed through a 35-um filter. Before the addition of 
antibodies, cells from human samples were blocked in10% humanserum 
(Sigma-Aldrich, H4522) and mouse samples were blocked in anti-mouse 
CD16/32 antibody (1:100; BioLegend, 101302) and 10% normal mouse 
serum (Sigma, M5905) for 10 min at 4 °C. 

For human PBMCscRNA-seq, 4.9-ml peripheral venous blood samples 
were collected in EDTA-coated tubes (Sarstedt, S-Monovette 4.9ml K3E) 
and placed onice. Blood samples were transferred into a50-ml Falcon 
tube. After lysis of red blood cells (BioLegend, 420301), blood samples 
were then centrifuged at 500g for 5 min and supernatant was removed. 
Pelleted samples were then resuspended in staining buffer (PBS plus 2% 
BSA; Sigma-Aldrich) and centrifugation was repeated. Samples were 
then blocked in 10% human serum (Sigma-Aldrich, H4522) in staining 
buffer onice for 30 min. Cells were then resuspended in staining buffer 
and passed through a 35-um filter before antibody staining. 


Flow cytometry and cell sorting 

Incubation with primary antibodies was performed for 20 minat4 °C. All 
antibodies, conjugates, lot numbers and dilutions used in this study are 
presented in Supplementary Table 19. After antibody staining, cells were 
washed with PEB buffer. For human macrophage flowcytometry analysis 
andcell sorting, cells were then incubated with streptavidin-BV711 for 20 
minat4 °C (BioLegend 405241; 1:200). For human and mouse cell sorting 
(FACS) and mouse flow cytometry analysis, cell viability staining (DAPI; 
1:1,000) wasthen performed, immediately before acquiring thesamples. 

Human cell sorting for sCRNA-seq was performed on a BD Influx (Bec- 
ton Dickinson). Viable single CD45* (leucocytes) or CD45" (other non- 
parenchymal cells) cells were sorted from human liver tissue (Extended 
Data Fig. 1b) and viable CD45* CD66b (PBMC) cells were sorted from 
peripheral blood (Extended Data Fig. 1c) and processed for droplet- 
based scRNA-seq. 

To generate conditioned medium from cirrhotic liver macrophage 
subpopulations, cells were sorted on a BD FACSAria Fusion (Becton 
Dickinson). Sorted SAMacs (viable CD45*Lin HLA-DR*CD14*CD16* 
CD163 TREM2*CD9*), TMs (viable CD45*Lin”- HLA-DR*CD14*CD16* 
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CD163 TREM2 CD9_) and KCs (viable CD45*Lin HLA-DR*CD14*CD16*CD 
163°CD9 ) were plated in 12-well plates (Corning, 3513) in DMEM (Gibco, 
41965039) containing 2% fetal bovine serum (FBS; Gibco, 10500056) 
at 1x 10° cells per ml for 24 h at 37°C 5% CO,. Control wells contained 
medium alone. Conditioned medium was collected, centrifuged at 
400g for 10 min, and supernatant was stored at —80 °C. 

For human macrophage flow cytometry analysis, after surface anti- 
body staining, cells were stained with Zombie NIR fixable viability dye 
(BioLegend, 423105) according to the manufacturer’s instructions. 
Cells were washed in PEB then fixed in Intracellular (IC) Fixation Buffer 
(Thermo Fisher, 00-8222-49) for 20 min at 4 °C. Fixed samples were 
stored in PEB at 4 °C until acquisition. Flow cytometry acquisition was 
performed on asix-laser Fortessa flow cytometer (Becton Dickinson). 
The gating strategy is shown in Extended Data Fig. 4f and Fig. 2f. 

Mouse macrophage cell sorting for scRNA-seq and co-culture 
experiments was performed ona BD FACSAriall (Becton Dickinson). 
For scRNA-seq, viable CD45*Lin(CD3, NK1.1, Ly6G, CD19)" cells were 
sorted from healthy (n = 3) and CCl,-treated (n = 3) mice and pro- 
cessed for droplet-based scRNA-seq. For transwell co-culture, viable 
CD45‘Lin CD11b*F4/80*TIMD4 CD9* (SAMacs) or CD9 (TMs) cells were 
sorted from CCl,-treated mice (Extended Data Fig. 6e). Flowcytometry 
analysis on macrophages from healthy and CCl,-treated mice was also 
performed onaBD FACSAriall (Becton Dickinson), using the same gating 
strategy (Extended Data Fig. 6e). All flow cytometry data were analysed 
using FlowJo software (Treestar). 


Luminex assay 

Detection of CCL2, galectin-3, IL-1B, CXCL8 and osteopontin (SPP1) and 
CD163 proteins in conditioned medium from human liver macrophage 
subpopulations was performed using a custom human luminex assay 
(R&D Systems), according to the manufacturer's protocol. Data were 
acquired using a Bio-Plex 200 (Bio-Rad) and are presented as MFI for 
each analyte. 


Cell culture 

Primary human HSCs (ScienCell, 5300) were cultured in stellate cell 
medium (SteCM, ScienCell, 5301) on poly-L-lysine (Sigma, P4832)-coated 
T75 tissue culture flasks, according to the supplier’s protocol. All experi- 
ments were performed using cells between passages 3 and 5S. For assess- 
ment of fibrillar collagen gene expression, HSCs were plated at 75,000 
cells per well in 24-well plates (Costar, 3524) in HSC medium consisting 
of DMEM (Gibco, 21969-035) with 20 uM HEPES (Sigma, H3375), 2 mM 
L-glutamine (Gibco, 25030-024), 1% penicillin streptomycin (Gibco, 
15140-122) and 2% FBS (Gibco, 10270). HSCs were serum-starved over- 
night (in HSC medium without FBS), washed with PBS, then 250 pl of 
conditioned medium from primary human macrophage subpopulations 
was added for 24 h. HSCs were obtained for RNA. 


Human HSC proliferation 

For proliferation assays, after serum starvation, HSCs were obtained 
using TrypLE Express (Gibco, 12604013), re-suspended in HSC medium 
at 2.5 x 10* per ml with Incucyte NucLight Rapid Red (Essen Biosciences, 
4717) ata dilution of 1:500 and seeded into 384-well plates (Greiner Bio- 
One, 781090) at 25 1! per well. HSCs were then treated with (1) control 
medium; (2) PDGF-BB (10 ng mI7; Peprotech, 100-14B) or TNFSF12 (100 
ng ml™; Peprotech, 310-06-5) with or without the PDGFRa inhibitor 
crenolanib” (1 1M; Cayman Chemicals, CAY1873), anti-TNFRSF12A (2 
pg ml”; Life Technologies, 16-9018-82, clone ITEM-4), mouse lgG2b 
kappa isotype control antibody (2 ug ml”; Life Technologies, 16-4732- 
82, clone eBMG2b) or vehicle control as indicated; (3) conditioned 
medium from human hepatic macrophage subpopulations as indi- 
cated. The final volume was 50 ul for all conditions. Cells were then 
incubated in an Incucyte ZOOM live cell analysis system (Essen Bio- 
sciences) humidified at 37 °C with 5% CO, with imaging every 3 husing 
the 10x optic for either 87 h (recombinant cytokines/inhibitors) or 39 


h (macrophage-conditioned medium). Analysis was performed with 
the Incucyte proprietary analysis software (v.2018A) by using machine 
learning to distinguish the individual nuclei (stained red by the Nuc- 
Light Rapid Red dye) and perform nuclear counts of the images at each 
3 htime point over the period of culture. Data are expressed as the AUC 
for percentage change in nuclear number from baseline versus time 
(hours), calculated in GraphPad Prism. 


Gene knockdown in human HSCs 

Knockdown of NOTCH3 in human HSCs was performed using siRNA. 
HSCs were plated at 75,000 cells per well in a 12-well plate (Costar, 3513) 
followed by serum starvation overnight (in HSC medium without FBS). 
siRNA duplexes with Lipofectamine RNAiMAX Transfection Reagent 
(Thermo Fisher, 13778075) were prepared in OptiMEM (Thermo Fisher, 
31985070) according to the manufacturer’s recommendations, and used 
ata concentration of 50 nM. Cells were exposed to the duplex for 48 h, in 
HSC medium containing 2% FBS. Cells were collected for RNA and quanti- 
tative PCR with reverse transcription (RT-qPCR). Knockdown efficiency 
was assessed by NOTCH3 RT-qPCR. The best siRNA for knockdown was 
determined empirically using the FlexiTube GeneSolution kit (Qiagen, 
GS4854). HSCs treated with control siRNA (Qiagen, 1027280) and siRNA 
for NOTCH3 (Qiagen, Hs NOTCH3._3,SI00009513; knockdown 83%) were 
then assessed for fibrillar collagen gene expression. 


Mouse HSC activation 

Primary mouse HSCs were isolated from healthy mice as previously 
described”. In brief, after cannulation of the inferior vena cava, the 
portal vein was cut to allow retrograde step-wise perfusion with pronase 
(Sigma, P5147) and collagenase D (Roche, 11088882001) -containing 
solutions, before ex vivo digestion in a solution containing pronase, 
collagenase D and DNase (Roche, 10104159001). HSCs were isolated 
from the digest solution by Histodenz (Sigma, D2158-100G) gradient 
centrifugation. HSCs were plated at a density of 400,000 cells per well 
ina 24-well plate (Costar, 3524) in HSC medium containing 10% FBS. 
After overnight culture, cells were washed with PBS and cultured in HSC 
medium containing 2% FBS. For macrophageco-culture, transwellinserts 
(0.4-um polyester membrane; Costar, 3470) were then placed above 
adherent HSCs. FACS-sorted CD9* mouse SAMacs or CD9’ mouse TMs 
from CCl,-treated mice were resuspended in HSC medium containing 
2% FBS at 400,000 cells per ml and 200,000 cells were added to the top 
of the transwell insert. Co-culture proceeded for 48 h and HSCs were 
collected for RNA. Quiescent HSCs (collected at the start of co-culture) 
were used as acontrol population. 


Isolation of human liver endothelial cells 

Human liver endothelial cells were isolated from cirrhotic explant livers 
andnon-fibrotic control donor liver as previously described”. Endothe- 
lial cells were cultured on plasticware coated with rat-tail collagen (Sigma, 
C3867) in complete endothelial medium consisting of endothelial basal 
media (Thermo Fisher, 11111044) containing 10% heat-inactivated human 
serum (tcsBiosciences, CS100-500), 100 U penicillin, 1OO pg mI" strep- 
tomycin, 2mM glutamine (Sigma, G6784), VEGF (10 ng mI; Peprotech, 
100-20) and 10 ng mI HGF (10 ng mI’; Peprotech, 100-39). Expression 
of PLVAP, CD34, ACKR1 and JAGI was assessed using flow cytometry. 


Flow-based adhesion assays 

Flow-based adhesion assays were performedas previously described”””. 
In brief, endothelial cells from healthy and cirrhotic liver were seeded 
onto arat-tail collagen-coated Ibidi slide VI°* (Ibidi, 80606) at a density 
to give a monolayer and incubated overnight. Peripheral blood was col- 
lected from healthy donors in EDTA-coated tubes. PBMCs were isolated 
using a lympholyte density gradient (Cedarlane Laboratories) then 
washed in PBS containing 1 mM Ca”*, 0.5 mM Mg” and 0.15% bovine 
serum albumin (BSA). Monocytes were enriched from PBMCs using a 
pan-monocyte isolation kit (Miltenyi Biotech, 130-096-537) according 


to the manufacturer’s protocol. For flow-based adhesion assays, cells 
were resuspended at 10° cells per millilitre in endothelial basal media 
(Thermo Fisher, 11111044) containing 0.15% BSA, then perfused over 
the endothelial cell monolayer for 5 min at 0.28 ml min™. Non-adher- 
ent cells were washed off during 5 min perfusion of 0.15% BSA human 
basal endothelial medium and 10 random non-overlapping images were 
randomly recorded from each channel. Total adherent (bright-phase; 
expressed as cell number per mm” per million cells perfused) and trans- 
migrating cells (dark-phase; expressed as percentage total adherent 
cells) on an endothelial cell monolayer from each patient were counted 
and quantified as previously described”. 


Gene knockdown in endothelial cells 

Knockdown of ACKR1 and PLVAP gene expression in human cirrhotic 
endothelial cells was performed using siRNA as previously described”. 
In brief, siRNA duplexes for PLVAP, ACKRI1 or negative control (Qia- 
gen, 1027280) with Lipofectamine RNAiMAX Transfection Reagent 
(Thermo Fisher, 13778075) were prepared in OptiMEM (Thermo Fisher, 
31985070) according to the manufacturer’s recommendations, and 
used at a concentration of 25 nM. Cells were exposed to the duplex for 
4 hat 37 °C, after which time the medium was replaced with endothelial 
basal medium containing 10% heat-inactivated human serum for 24h. 
The medium was then replaced with complete endothelial medium and 
incubated at 37 °C with 5% CO, fora further 24 h. Knockdown efficacy was 
assessed by flowcytometry and the MFI (Extended Data Fig. 8f). The best 
siRNA for knockdown was determined empirically using the FlexiTube 
GeneSolution kit (Qiagen, GS83483 (PLVAP) and GS2532 (ACKR1)). For 
flow-based adhesion assays, siRNAs against PLVAP (Qiagen, Hs PLVAP _1, 
S$100687547; knockdown 50.6%), ACKRI (Qiagen, Hs Fy_5,S102627667; 
knockdown 37.7%) or control siRNA were selected. Then, 90,000 
endothelial cells from cirrhotic patients (n = 6) were seeded into chan- 
nels ofarat-tail collagen-coated Ibidi slide VI°* and gene knockdown was 
performed, followed by flow-based adhesion assay as described above. 


Co-culture of endothelial cells and HSCs 

HSCs (15,000 cells) were seeded onto an Ibidi slide VI°* with and with- 
out primary human endothelial cells (15,000 cells) from individual 
patients with cirrhosis (n = 3) in complete endothelial medium. After 
2h, all growth factor supplements were removed and cells were cultured 
for a further 72 h in endothelial basal medium containing 10% heat- 
inactivated human serum with or without the Notch signalling inhibitor 
dibenzazepine (10 uM; Bio-Techne, 4489/10) or vehicle (DMSO) control. 
Cells were fixed in 4% paraformaldehyde (PFA) for 30 min, permeabilized 
with 0.3% Triton X-100 in PBS for 5 min and blocked with 10% goat serum 
in PBS for 30 min followed by primary antibody incubation (mouse anti- 
PECAM1 and rabbit anti-collagen 1; see Supplementary Table 19) for1 
h. Cells were washed in 0.1% Triton X-100 in PBS followed by addition 
of fluorescently conjugated secondary antibodies (1:500 dilution) for 
1h. Cells were mounted with Pro-long Gold anti-fade DAPI, images were 
taken on the Confocal Microscope Zeiss LSM780, and the collagen 1 
staining area was quantified using IMARIS. 


RNA extraction and RT-qPCR 

RNA was isolated from HSCs using the RNeasy Plus Micro Kit (Qiagen, 
74034) and cDNA synthesis performed using the QuantiTect Reverse 
Transcription Kit (Qiagen, 205313) according to the manufacturer’s 
protocol. Reactions were performed intriplicatein384-well plate format 
and were assembled using the QIAgility automated pipetting system 
(Qiagen). RT-qPCR for human HSCs was performed using PowerUp 
SYBR Green Master Mix (Thermo Fisher, A25777) with the following 
primers (all Qiagen): GAPDH (QT00079247), COLIA1 (QT00037793), 
COL3A1 (QT00058233) and NOTCH3 (QT00003374). RT-qPCR for 
mouse HSCs was performed using TaqMan Fast Advanced Master 
Mix (Thermo Fisher, 4444557) with the following primers: Gapdh 
(Thermo Fisher, Mm99999915 g1) and Col3al1 (Thermo Fisher, 


Mm00802300_m1). Samples were amplified on an ABI 7900HT FAST 
PCRsystem (Applied Biosystems, Thermo Fisher Scientific). Data were 
analysed using Thermo Fisher Connect cloud qPCR analysis software 
(Thermo Fisher Scientific). The 2“4 quantification method, using 
GAPDH for normalization, was used to estimate the amount of target 
mRNA insamples, and expression calculated relative to average MRNA 
expression levels from control samples. 


Immunohistochemistry, immunofluorescence and single- 
molecule FISH 

Formalin-fixed paraffin-embedded human liver tissue was cut into 
4-um sections, dewaxed, rehydrated, then incubated in 4% neutral- 
buffered formalin for 20 min. After heat-mediated antigen retrieval in 
pH 6 sodium citrate (microwave; 15 min), slides were washed in PBS and 
incubated in 4% hydrogen peroxide for 10 min. Slides were then washed 
in PBS, blocked using protein block (GeneTex, GTX30963) for hat room 
temperature before incubation with primary antibodies for 1hat room 
temperature. A full list of primary antibodies and conditions is shown 
in Supplementary Table 19. Slides were washed in PBS plus 0.1% Tween 
20 (PBST; Sigma-Aldrich, P1379) then incubated with ImmPress HRP 
Polymer Detection Reagents (depending on species of primary; rabbit, 
MP-74.01; mouse, MP-64.02-15; goat, MP-7405; all Vector Laboratories) 
for 30 min at room temperature. Slides were washed in PBS followed by 
detection. For DAB staining, sections were incubated with DAB (DAKO, 
K3468) for 5 min and washed in PBS beforea haematoxylin (Vector Labo- 
ratories, H3404) counterstain. For multiplex immunofluorescence 
staining, following the incubation with ImmPress and PBS wash, initial 
staining was detected using Cy3, CyS, or fluorescein tyramide (Perkin- 
Elmer, NEL741B001KT) at a 1:1,000 dilution. Slides were then washed in 
PBST followed by further heat treatment with pH 6 sodium citrate (15 
min), washes in PBS, protein block, incubation with the second primary 
antibody (incubated overnight at 4 °C), ImmPress Polymer and tyramide 
as before. This sequence was repeated for the third primary antibody 
(incubated at room temperature for 1h) anda DAPI-containing mount- 
ant was then applied (Thermo Fisher Scientific, P36931). 

For AMEC staining (only CLEC4Mimmunohistochemistry), all washes 
were carried out with TBST (dH,O, 200 mM Tris, 1.5 M NaCl, 1% Tween-20 
(all Sigma-Aldrich) pH 7.5) and peroxidase blocking was carried out for 
30 minin 0.6% hydrogen peroxide in methanol. Sections were incubated 
with AMEC (Vector Laboratories, SK-4285) for 20 minand washed in TBST 
before a haematoxylin (Vector Laboratories, SK-4285) counterstain. 

For combined single-molecule fluorescent in situ hybridization 
(smFISH) andimmunofluorescence, detection of TREM2 was performed 
using the RNAscope 2.5 LS Reagent Kit BrownAssay (Advanced Cell 
Diagnostics) in accordance with the manufacturer’s instructions. In 
brief, 5-,1m tissue sections were dewaxed, incubated with endogenous 
enzyme block, boiled in pre-treatment buffer and treated with pro- 
tease, followed by target probe hybridization using the RNAscope LS 2.5 
Hs-TREM2 (420498, Advanced Cell Diagnostics) probe. Target RNA was 
then detected with Cy3 tyramide (Perkin-Elmer, NEL744BOO1KT) ata 
1:1,000 dilution. Thesections were then processed throughapH 6sodium 
citrate heat-mediated antigen retrieval, hydrogen peroxidase treatment 
and protein block (all as for multiplex immunofluorescence staining as 
above). MNDA antibody was applied overnight at 4 °C, completed using 
asecondary ImmPress HRP Anti-Rabbit Peroxidase IgG (Vector Labora- 
tories, MP7401), visualized using a fluorescein tyramide (Perkin-Elmer, 
NEL741BOO1KT) at a1:1,000 dilution and stained with DAPI. 

Bright-field and fluorescently stained sections were imaged using the 
slide scanner AxioScan.Z1 (Zeiss) at 20x magnification (40x magnifica- 
tion for smFISH). Images were processed and scale bars added using Zen 
Blue (Zeiss) and Fiji software®. 


Cell counting and image analysis 
Automated cell counting was performed using QuPath software™*. In 
brief, DAB-stained whole tissue section slide-scanned images (CZ files) 
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were imported into QuPath. Cell counts were carried out using the posi- 
tive cell detection tool, detecting haematoxylin-stained nuclei and then 
thresholding for positively stained DAB cells, generating DAB-positive 
cellcounts per mm/’tissue. Identical settings andthresholds wereapplied 
toall slides for a given stain and experiment. For cell counts of fibrotic 
septae versus parenchymal nodules, the QuPath segmentation tool 
was used to segment the DAB-stained whole tissue section into fibrotic 
septae or non-fibrotic parenchymal nodule regions using tissue morpho- 
logical characteristics (Fig. 2j). Positive cell detection was then applied 
tothe fibrotic and non-fibrotic regions in turn, providing DAB-positive 
cell counts per mm’ in fibrotic septae and non-fibrotic parenchymal 
nodules for each tissue section. 

Digital morphometric pixel analysis was performed using the Train- 
able Weka Segmentation (TWS) plugin” in Fiji software®. In brief, each 
stained whole tissue section slide-scanned image was converted into 
multiple TIFF files in Zen Blue software (Zeiss). TIFF files were imported 
into Fijiand TWS plugin trained to produce a classifier which segments 
images into areas of positive staining, tissue background and white 
space*. The same trained classifier was then applied to all TIFF images 
from every tissue section for a particular stain, providing a percentage 
area of positive staining for each tissue section. For digital morphometric 
quantification of positive staining of fibrotic septae versus parenchymal 
nodules, TIFF images were segmented into fibroticseptae or non-fibrotic 
parenchymal nodule regions using tissue morphological characteristics, 
followed by analysis using the TWS plugin in Fiji software. 


Histological assessment of NASH sections 

Sections stained with haematoxylin and eosin or picrosirius red were 
whole-slide imaged using a NanoZoomer imager (Hamamatsu Pho- 
tonics). Images of stained sections were independently scored by a 
consultant liver transplant histopathologist (T.J.K.) at the national 
liver transplant centre with experience in trial scoring by applying the 
ordinal NAFLD activity score**. For observer-independent quantifica- 
tion of the area of positive picrosirius red staining, images were split 
using ndpisplit” into tiles of x5 magnification before the application 
of aclassifier that had been trained by the liver histopathologist using 
the machine learning WEKA plugin in FIJI3”*, as previously described*’. 
All analysis was undertaken blinded to all other data. 


Droplet-based scRNA-seq 

Single cells were processed through the Chromium Single Cell Plat- 
form using the Chromium Single Cell 3’ Library and Gel Bead Kit v2 (10X 
Genomics, PN-120237) and the Chromium Single Cell A Chip Kit (10X 
Genomics, PN-120236) as per the manufacturer’s protocol. In brief, sin- 
gle cells were sorted into PBS plus 0.1% BSA, washed twice and counted 
using a Bio-Rad TC20. Then, 10,800 cells were added to each lane of the 
10X chip. The cells were partitioned into Gel Beads in Emulsion in the 
Chromium instrument, in which cell lysis and bar-coded reverse tran- 
scription of RNA occurred, followed by amplification, fragmentation 
and 5’ adaptor and sample index attachment. Libraries were sequenced 
onan Illumina HiSeq 4000. 


Computational analysis 

In total, we analysed 67,494 human cells from healthy (n = 5) and cir- 
rhotic (n= 5) livers, 30,741 PBMCs from patients with cirrhosis (n = 4) 
and compared our data with a publicly available reference dataset of 
8,381 PBMCs from a healthy donor (https://support.10xgenomics.com/ 
single-cell-gene-expression/datasets/2.1.0/pbmc8k). 


Pre-processing scRNA-seq data 

We aligned to the GRCh38 and mm10 (Ensembl 84) reference genomes 
as appropriate for the input dataset, and estimated cell-containing parti- 
tions and associated unique molecular identifiers (UMIs), using the Cell 
Ranger v.2.1.0 Single-Cell Software Suite from 10X Genomics. Genes 
expressed in fewer than three cells ina sample were excluded, as were 


cells that expressed fewer than 300 genes or mitochondrial gene con- 
tent >30% of the total UMI count. We normalized by dividing the UMI 
count per gene by the total UMI count in the corresponding cell and 
log-transforming. Variation in UMI counts between cells was regressed 
according to anegative binomial model, before scaling and centring the 
resulting value by subtracting the mean expression of each gene and 
dividing by its standard deviation (E,), then calculating In(10* x £, +1). 


Dimensionality reduction, clustering and differential expression 
analysis 

We performed unsupervised clustering and differential gene expres- 
sion analyses in the Seurat R package v.2.3.0”. In particular, we used 
shared nearest neighbour graph-based clustering, in which the graph 
was constructed using from 2to 11 principal components as determined 
by dataset variability shown in principal component analysis (PCA); the 
resolution parameter to determine the resulting number of clusters 
was also tuned accordingly. To assess cluster similarity we used the 
‘BuildClusterTree’ function from Seurat. 

In total, we present scRNA-seq data from ten human liver samples 
(named healthy 1-5 and cirrhotic 1-5), five human blood samples (n=4 
cirrhotic named blood 1-4 and n=1 healthy named PBMC8K; pbmc8k 
dataset sourced from single-cell gene expression datasets hosted by 
10X Genomics), and two mouse liver samples (n=3 uninjured andn=3 
fibrotic). For seven human liver samples (healthy 1-4 and cirrhotic 
1-3), we performed scRNA-seq on both leucocytes (CD45") and other 
non-parenchymal cells (CD45 ); for the remaining three human livers 
(healthy 5, cirrhotic 4-5) we performed scRNA-seq on leucocytes only 
(Extended Data Fig. 2e, f). 

Initially, we combined all human scRNA-seq datasets (liver and blood) 
and performed clustering analysis with the aim of isolating a population 
of liver-resident cells, by identifying contaminating circulatory cells 
within datasets generated from liver digests and removing them from 
downstream analysis. Specifically, we removed from our liver datasets 
cells that fell into clusters 1and 13 of the initial dataset in Extended Data 
Fig. 1d. 

Using further clustering followed by signature analysis, we inter- 
rogated this post-processed liver-resident cell dataset for robust cell 
lineages. These lineages were isolated into individual datasets, and 
the process was iterated to identify robust lineage subpopulations. At 
each stage of this process we removed clusters expressing more than 
one unique lineage signature in more than 25% of their cells from the 
dataset as probable doublets. This resulted in removal of 1,351 cells. 
Where the cell proliferation signature identified distinct cycling sub- 
populations, we re-clustered these again to ascertain the identity of 
their constituent cells. 

The mouse scRNA-seq datasets were combined, clustered and inter- 
rogated for celllineagesinasimilar manner totheir humancounterparts. 

All heat maps, ¢-distributed stochastic neighbour embedding (t-SNE) 
and UMAP visualizations, violin plots and dot plots were produced using 
Seurat functions in conjunction with the ggplot2, pheatmap and gridR 
packages. t-SNE and UMAP visualizations were constructed using the 
same number of principal components as the associated clustering, 
with perplexity ranging from 30 to 300 according to the number of cells 
in the dataset or lineage. We conducted differential gene expression 
analysis in Seurat using the standard AUC classifier to assess significance. 
We retained only those genes with a log-transformed fold change of 
at least 0.25 and expression in at least 25% of cells in the cluster under 
comparison. 


Defining cell lineage signatures 

For each cell, we obtained a signature score across a curated list of known 
marker genes per cell lineage in the liver (Supplementary Table 2). This 
signature score was defined as the geometric mean of the expression 
of the associated signature genes in that cell. Lineage signature scores 
were scaled from 0 to 1across the dataset, and the score for each cell 


witha signature less than a given threshold (the mean of said signature 
score across the entire dataset) was set to 0. 


Batch effect and quality control 
To investigate agreement between samples, we extracted the average 
expression profile for a given cell lineage in each sample, and calculated 
the Pearson correlation coefficients between all possible pairwise com- 
parisons of samples per lineage*®. 


Imputing dropout inT cell and ILC clusters 

To impute dropout of low-abundance transcripts in our T cell and ILC 
clusters so that we might associate them with known subpopulations, 
we downsampled to 7,380 cells from 36,900 and applied the sclmputeR 
package v.0.0.8", using as input both our previous annotation labels and 
k-means spectral clustering (k=5), but otherwise default parameters. 


Analysing functional phenotypes of scar-associated cells 

For further analysis of function we adopted the self-organizing maps 
approach as implemented in the SCRAT R package v.1.0.0*. For each 
lineage of interest, we constructed a self-organizing map in SCRAT using 
default input parameters and according to its clusters. We defined the 
signatures expressed in a cell by applying a threshold criterion (e™ 
resh = 9,95 x e™) selecting the highest-expressed metagenes in each cell, 
andidentified for further analysis those metagene signatures defining at 
least 30% of cells in at least one cluster within the lineage. We smoothed 
these self-organizing maps using the ‘disaggregate’ function from the 
raster R package for visualization purposes, and scaled radar plots to 
maximum proportional expression of the signature. Gene Ontology 
enrichment analysis on the genes in these spots was performed using 
PANTHER 13.1 (pantherdb.org). 


Inferring injury dynamics and transcriptional regulation 

To generate cellular trajectories (pseudotemporal dynamics) we used 
the monocleR package v.2.6.1. We ordered cells ina semi-supervised 
manner onthe basis of their Seurat clustering, scaled the resulting pseu- 
dotime values from 0 to1, and mapped them onto either the t-SNE or 
UMAP visualizations generated by Seurat or diffusion maps as imple- 
mented in the scater R package v.1.4.0“* using the top 500 variable genes 
as input. We removed mitochondrial and ribosomal genes from the gene 
set forthe purposes of trajectory analysis. Differentially expressed genes 
along this trajectory were identified using generalized linear models via 
the ‘differentialGeneTest’ function in monocle. 

When determining significance for differential gene expression along 
the trajectory, we set ag-value threshold of 1x 10°. We clustered these 
genes using hierarchical clustering in pheatmap, cutting the tree at 
k=3 to obtain gene modules with correlated gene expression across 
pseudotime. Cubic smoothing spline curves were fitted to scaled gene 
expression along this trajectory using the smooth.spline command 
fromthe stats R package, and Gene Ontology enrichment analysis again 
performed using PANTHER 13.1. 

We verified the trajectory and its directionality using the velo- 
cyto R package v.0.6.0”, estimating cell velocities from their spliced 
and unspliced mRNA content. We generated annotated spliced and 
unspliced reads from the 10X BAM files via the ‘dropEst’ pipeline, 
before calculating gene-relative velocity using KNN pooling with 
k= 25, determining slope gamma with the entire range of cellular 
expression, and fitting gene offsets using spanning reads. Aggregate 
velocity fields (using Gaussian smoothing ona regular grid) and 
transition probabilities per lineage subpopulations were visualized 
on t-SNE, UMAP, or diffusion map visualizations as previously gen- 
erated. Gene-specific phase portraits were plotted by calculating 
spliced and unspliced mRNA levels against steady-state inferred 
by a linear model; levels of unspliced MRNA above and below this 
steady-state indicate increasing and decreasing expression of said 
gene, respectively. Similarly, we plotted the unspliced count signal 


residual per gene, based on the estimated gamma fit, with positive 
and negative residuals indicating expected upregulation and down- 
regulation, respectively. 

For transcription factor analysis, we obtained alist of all genes identi- 
fied as acting as transcription factors in humans from AnimalTFDB*. To 
analyse transcription factor regulons further, we adopted the SCENIC 
v.0.1.7 workflowinR“, using default parameters and thenormalized data 
matrices from Seurat as input. For visualization, we mapped the regulon 
activity (AUC) scores thus generated to the pseudotemporal trajectories 
from monocle and the clustering subpopulations from Seurat. 


Analysing inter-lineage interactions within the fibrotic niche 

For comprehensive systematic analysis of inter-lineage interactions 
within the fibrotic niche, we used CellPhoneDB”. CellPhoneDB is a 
manually curated repository of ligands, receptors and their interac- 
tions, integrated with a statistical framework for inferring cell-cell 
communication networks from single-cell transcriptomic data. In brief, 
we derived potential ligand-receptor interactions onthe basis of the 
expression of a receptor by one lineage subpopulation and a ligand 
by another; as input to this algorithm, we used cells from the fibrotic 
niche as well as liver sinusoidal endothelial cells and KCs as controls, 
and we considered only ligands and receptors expressed in greater 
than5% of the cells in any given subpopulation. Subpopulation-specific 
interactions were identified as follows: (1) randomly permuting the 
cluster labels of all cells 1,000 times and determining the mean of 
the average receptor expression of a subpopulation and the average 
ligand expression of the interacting subpopulation, thus generating 
anull distribution for each ligand-receptor pair in each pairwise com- 
parison between subpopulations; (2) calculating the proportion of 
these means that were ‘as or more extreme’ than the actual mean, thus 
obtaining a P value for the likelihood of subpopulation specificity fora 
given ligand-receptor pair; (3) prioritizing interactions that displayed 
specificity to subpopulations interacting within the fibrotic niche. 


Canonical correlation analysis 

To compare human and mouse populations of monocytic phagocytes, 
we used canonical correlation analysis as implemented in Seurat'®. We 
map the genes inthe human dataset to their mouse orthologues using 
biomaRt, discarding any genes for which no orthologues can be found. 
We then calculate the shared low-dimensional subspace on the union 
of genes that are variably expressed in both datasets (n = 159), and 
align using six canonical components as determined by evaluating the 
biweight midcorrelation. Results are visualized by t-SNE analysis as 
previously described. 


Deconvolution of whole liver microarray data 

To assess the macrophage composition of early-stage NAFLD, we per- 
formed deconvolution analysis on publicly available microarray data 
from annotated liver biopsy specimens taken across the NAFLD disease 
spectrum (GEO accession GSE48452)”. Tissue MP cells from our human 
scRNA-seq data were manually clustered into the main annotated MP 
populations. Signature gene expression profiles of SAMacs, TMs and 
KCs were used to deconvolve the monocyte-macrophage composition 
of liver biopsy samples from GSE48452 using Cibersort™, as previously 
described”. The monocyte-macrophage composition of each biopsy 
sample was then compared to the associated histological and demo- 
graphic features, available from the GEO database. 


Statistics and reproducibility 

To assess whether our identified subpopulations were significantly 
overexpressed in injury, we posited the proportion of injured cells in 
each cluster as arandom count variable using a Poisson process, as pre- 
viously described*°. We modelled the rate of detection using the total 
number of cells in the lineage profiled in a given sample as an offset, with 
the condition of each sample (healthy versus cirrhotic) provided as a 
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covariate factor. The model was fitted using the Rcommand ‘glm’ from 
the stats package. The Pvalue for the significance of the proportion of 
injured cells was assessed using a Wald test onthe regression coefficient. 
Remaining statistical analyses were performed using GraphPad Prism. 
Comparison of changes between two groups was performed using a 
Mann-Whitney test (unpaired; two-tailed) or a Wilcoxon matched-pairs 
signed rank test (paired; two-tailed). Comparison of changes between 
multiple groups was performed using a Kruskal-Wallis and Dunn, one- 
way ANOVA and Tukey or repeated measures one-way ANOVA and Tukey 
tests. Correlations were preformed using Pearson correlation and best- 
fit line plotted using linear regression. P< 0.05 was considered statis- 
tically significant. All immunofluorescence stains were repeated ina 
minimum of three patients and representative images are displayed. 


Reporting summary 


Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


Our expression data are freely available for user-friendly interactive 
browsing online at http://www. livercellatlas.mvm.ed.ac.uk. Cell- 
PhoneDBis available at www.CellPhoneDB.org. All raw sequencing data 
have been deposited in the Gene Expression Omnibus (GEO) under 
accession GSE136103. 


Code availability 


Rscripts enabling the main steps of the analysis are available from the 
corresponding authors on reasonable request. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Strategy for isolation of human liver non-parenchymal 
cells. a, Patient demographics and clinical information. Data are mean +s.e.m. 
b, Flowcytometry gating strategy for isolation of leucocytes (CD45°*) and other 
non-parenchymal cells (CD45 ) from human liver; representative plots from ten 
livers.c, Flowcytometry gating strategy for isolation of PBMCs; representative 
plots from four patients. d, Clustering 103,568 cells from healthy (n=5) and 
cirrhotic (n=5) livers, healthy PBMCs (n=1) and cirrhotic PBMCs (n= 4) (left), 
annotating the source (PBMC versus liver; middle) and cell lineage inferred from 
known marker gene signatures (right). e, Dot plot annotating PBMC and liver 
clusters by lineage signatures. Circle size indicates cell fraction expressing 
signature greater than mean; colour indicates mean signature expression (red, 
high; blue, low). f, CXCR4 gene expression in single cells derived from blood or 


liver tissue, divided by cell lineage. Bottom right, representative 
immunofluorescence image (n > 3) of CXCR4 (green) and DAPI (blue) inhuman 
liver; arrows denote CXCR4- cells inthe lumen ofablood vessel. Scale bar, 50 pm. 
g, Violin plots showing the number of unique genes (nGene), number of total 
unique molecular identifiers (nUMI) and mitochondrial gene fraction expressed 
in five PBMC samples. Black lines denote the median. h, Pie charts showing the 
proportion of cell lineages per PBMC sample. i, Box and whisker plots showing 
the agreement in expression profiles across five PBMC samples. Pearson 
correlation coefficients between average expression profiles for cells in each 
lineage, across all pairs of samples. Black bars denote the median; box edges 
denote the twenty-fifth and seventy-fifth percentiles; whiskers denote the full 
range. 
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Extended Data Fig. 2 | Quality control and annotation of human liver- 
resident cells. a, Lineage signature expression across 66,135 liver-resident 
cells from healthy (n=5) and cirrhotic (n=5) human livers (red, high; blue, low). 
b, Dot plot annotating liver-resident cell clusters by lineage signature. Circle 
size indicates cell fraction expressing signature greater than mean; colour 
indicates mean signature expression (red, high; blue, low). c, Violin plots of the 
number of unique genes (left), number of total UMIs (middle) and 
mitochondrial gene fraction (right) across 66,135 liver-resident cells from 


healthy (n=5) and cirrhotic (n=5) livers. Black lines denote the median. d, Pie 
charts of the proportion of cell lineage per liver sample. e, Box and whisker 
plots of the agreement in expression profiles across healthy (n=5) and 
cirrhotic (n=5) liver samples, as in Extended Data Fig. 1i. f, ¢ SNE visualization of 
liver-resident cells per liver sample, with cirrhotic samples annotated by 
aetiology of underlying liver disease. ALD, alcohol-related liver disease; PBC, 
primary biliary cholangitis. 
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Extended Data Fig. 3| See next page for caption. 
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Extended Data Fig. 3 | Annotating humanliver lymphoid cells. a, Clustering 
of 36,900 T cells and ILCs (left) from healthy (n=5) and cirrhotic (n=5) human 
livers, annotating the injury condition (right). NK, natural killer cell; cNK, 
cytotoxic NK cell. b, Fractions of T cell and ILC subpopulations in healthy (n=5) 


and cirrhotic (n=5) livers.c, Selected gene expression in 36,900 T cells and ILCs. 


d, Heat map of T cell and ILC cluster marker genes (colour-coded by cluster and 
condition), with exemplar genes labelled (right). Columns denote cells; rows 
denote genes. e, ¢-SNE visualizations of downsampled T cell and ILC dataset 
(7,380 cells from healthy (n=5) and cirrhotic (n=5) human livers) before and 


after imputation (sclmpute); annotating data used for visualization and 
clustering, inferred lineage and injury condition. No additional heterogeneity 
was observed after imputation. f, Clustering 2,746 B cells and plasma cells (left) 
from healthy (n=5) and cirrhotic (n=5) human livers, annotating the injury 
condition (right). g, Heat map of B cell and plasma cell cluster marker genes 
(colour-coded by cluster and condition), with exemplar genes labelled (right). 
Columns denote cells; rows denote genes. h, Fractions of B cell and plasma cell 
subpopulations in healthy (n=5) and cirrhotic (n=5) livers. Data are 

mean +s.e.m. Pvalues determined by Wald test (b). 
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Extended Data Fig. 4 | See next page for caption. 


Extended Data Fig. 4 | Annotating human liver MPs. a, Clustering and 
selected genes expressed in10,737 MPs from healthy (n=5) and cirrhotic (n=5) 
human livers. b, Scaled gene expression of KC cluster markers across MP cells 
from healthy (n=5) and cirrhotic (n=5) livers.c, Representative 
immunofluorescence images (n >3) of TIMD4 (red), CD163 (white), MARCO 
(green) and DAPI (blue) in healthy and cirrhotic liver; arrows denote 
CD163*MARCO'TIMD4 cells. 

d,Immunohistochemistry (left) and cell counts (right) of TIMD4 expressionin 
healthy (n=12) and cirrhotic (n=9) human liver. e, Immunohistochemistry (left) 
and cell counts (right) of MARCO expression in healthy (n= 8) and cirrhotic 
(n=8) liver. f, Flow cytometry gating strategy for identifying KCs, TMs and 
SAMacs in healthy (n= 2) and cirrhotic (n= 3) liver. SAMacs are detected as 
TREM2°CD9 cells within the TM and SAMac gate (see Fig. 2f).g, Representative 
immunofluorescence images (n >3) of TREM2 (red), MNDA (white), collagen1 
(green) and DAPI (blue) in cirrhotic liver. h, Representative images (n= 2) of 


TREM2 (smFISH; red), MNDA (immunofluorescence; green) and DAPI (blue) in 
cirrhotic liver. i, Representative immunofluorescence images (n > 3) of CD9 
(red), MNDA (white), collagen 1 (green) and DAPI (blue) in cirrhotic liver. 

j, Immunohistochemistry (top) and cell counts (bottom) of TREM2 expressionin 
healthy (n=10) and cirrhotic (n=9) liver. k, Immunohistochemistry (top) and cell 
counts (bottom) of CD9 expression in healthy (n=12) and cirrhotic (n=10) liver. 
I, Top, exemplar tissue segmentation of cirrhotic liver section into fibrotic 
septae (orange) and parenchymal nodules (purple). Bottom, cell counts based 
onimmunohistochemistry analysis of TREM2 (n=9), CD9 (n=11), TIMD4 (n=9) 
and MARCO (n=7) in parenchymal nodules and fibrotic septae. m, Top, 
clustering and annotation of 208 cycling MP cells from healthy (n=5) and 
cirrhotic (n=5) livers, with scaled gene expression of MP subpopulation markers 
across four clusters of cycling MP cells. Bottom, fractions of cycling MP 
subpopulations in healthy (n=5) and cirrhotic (n=5) livers. Allscale bars, 50 um. 
Dataaremean+s.e.m. Pvalues determined by two-tailed Mann-Whitney (e,j,k), 
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Extended Data Fig. 5 | See next page for caption. 


Extended Data Fig. 5 | Phenotypic characterization of mononuclear 
phagocytes in healthy and cirrhotic human livers. a, Top, self-organizing map 
(60 x 60 grid) of smoothed scaled metagene expression of 10,737 MPs from 
healthy (n=5) and cirrhotic (n=5) livers. In total, 20,952 genes, 3,600 metagenes 
and 44 signatures were identified. A-F denote metagene signatures 
overexpressed in one or more MP subpopulations. Bottom, smoothed mean 
metagene expression profile for each MP subpopulation. b, Radar plots (left), 
exemplar genes (middle) and selected GO enrichment (right) of metagene 
signatures A-F showing distribution of signature expression across MP 
subpopulations from 10,737 MP cells. c, Diffusion map (DM) visualization of 
blood monocytes and liver-resident MP lineages (23,075 cells from healthy 
(n=5) and cirrhotic (n=5) liver samples and PBMCs (n=5)), annotating monocle 
pseudotemporal dynamics (purple to yellow). Top, RNA velocity field (red 
arrows) visualized using Gaussian smoothing on regular grid. Bottom, 
annotation of MPs by subpopulation and injury condition. d, Unspliced-spliced 
phase portraits (top); 23,075 cells coloured and visualized asin Fig. 3a; monocyte 
(MNDA), SAMac (CD9) and KC (TIMD4) marker genes. Cells plotted above or 
below the steady-state (black dashed line) indicate increasing or decreasing 
expression of gene, respectively. Spliced expression profile for stated genes 


(middle row; red, high, blue, low). Unspliced residuals for stated genes (bottom 
row), positive (red) indicating expected upregulation, negative (blue) indicating 
expected downregulation. MNDA displays negative velocity in SAMacs; CD9 
displays positive velocity inmonocytes and SAMacs; 7/MD4 velocity is restricted 
to KCs. e, Cubic smoothing spline curve fitted to averaged expression of all 
genes in module 2 from the blood monocyte-to-SAMac pseudotemporal 
trajectory (see Fig. 3c), with selected GO enrichment (right). f, Cubic smoothing 
spline curve fitted to averaged expression of all genes in module 3 fromthe 
blood monocyte-to-cDC pseudotemporal trajectory (see Fig. 3c), with selected 
GO enrichment (right). g, Luminex assay showing quantification of levels of 
stated proteins inculture medium from FACS-isolated SAMacs (n=3), TMs (n=2) 
and KCs (n= 2). Control denotes medium alone (n=2). Dataare mean +s.e.m.h, 
Heat map oftranscription factor regulons across MP pseudotemporal trajectory 
and in KCs (colour-coded by MP cluster, condition and pseudotime), with 
selected regulons labelled (right). Columns denote cells; rows denote genes. 

i, Scaled regulon expression of selected regulons across MP clusters from 
healthy (n=5) and cirrhotic (n=5) livers. All Pvalues determined by Fisher’s 
exact test. 
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Extended Data Fig. 6 | Characterization of macrophages in mouse liver 
fibrosis. a, Clustering and annotating 3,250 mouse (m)MPs from healthy (n= 3) 
and fibrotic (4 weeks CCl, treatment; n=3) livers. b, Annotating mouse MP cells 
by injury condition. c, Heat map of mouse MP cluster marker genes (top; colour- 
coded by cluster and condition), with exemplar genes labelled (right). Columns 
denote cells; rows denote genes. d, Selected genes expressed in 3,250 mouse 
MPs. e, Representative flow cytometry plots of the gating strategy (n=8 from 
two independent experiments) for identifying mouse KCs, CD9° TMs and CD9* 
SAMacs in fibrotic mice. f, Quantifying mouse macrophage subpopulations by 
flowcytometry in healthy (n= 6) and fibrotic (n=8) mouse livers from two 
independent experiments. The macrophage subpopulation (x axis) isshownasa 


percentage of total viable CD45" cells (y axis). Data are mean +s.e.m. Pvalues 
determined by two-tailed Mann-Whitney test. g, Co-culture of primary mouse 
HSCs from uninjured livers and either FACS-isolated CD9 mouse TMs or CD9* 
mouse SAMacs from fibrotic livers (n=8 mice; two independent experiments). 
Right, qPCR of Col3a1 expression in HSCs; expression relative to mean 
expression of quiescent HSC. Pvalue determined by two-tailed Wilcoxon test. 
h, Clustering 3,250 mouse MPs and 10,737 human (h)MPs into five clusters using 
canonical correlation analysis. Annotation of cross-species clusters (identity). 
i, Annotation of human and mouse macrophage subpopulations from 3,250 
mouse MPs and 10,737 human MPs.j, Selected genes expressed in 3,250 mouse 
MPs and 10,737 human MPs. 
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Extended Data Fig. 7 | SAMac expansion in human NASH. a-d, Deconvolution 
of publicly available whole liver microarray data (n= 73) assessed for frequency 
of SAMacs, KCs and TMs using the Cibersort algorithm. a, Macrophage 
composition. GEO accession numbers are shown on the x axis; the fraction of 
monocyte-macrophages is shown on the yaxis. Liver phenotypes are annotated 
at the top. b, Frequency of SAMacs in control (n=14), heathy obese (n=27), 
steatosis (n=14) and NASH (n=18) livers.c, Left, frequency of SAMacs in patients 
with histological NAFLD activity scores (NAS) of 0 (n =37), 1-3 (n=19) and 4-7 
(n=17). Right, frequency of SAMacs in patients with histological fibrosis scores 
of 0 (n=46),1(n=20) and 2-4 (n=5).d, Left, frequency of SAMacs in female 
(n=58) and male (n=15) patients. Middle, frequency of SAMacs in patients aged 
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(n=27).e, Left, immunohistochemistry of CD9 and TREM2 expression in NAFLD 
liver biopsy sections. Scale bars, 50 um. Right, cell counts of CD9 and TREM2 
expression. CD9: NAS 1-3 (n=13), NAS 4-8 (n= 21). TREM2: NAS 1-3 (n=12), NAS 
4-8 (n=16).f, Correlation of cell counts with picrosirius red (PSR) digital 
morphometric pixel quantification in NAFLD liver biopsy tissue with CD9 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Phenotypic characterization of endothelial cellsin 
healthy and cirrhotic human livers. a, Clustering and selected genes 
expressed in 8,020 endothelial cells from healthy (n= 4) and cirrhotic (n= 3) 
human livers. b, Scaled gene expression of endothelial cluster markers across 
endothelial cells from healthy (n =4) and cirrhotic (n =3) livers. c, Top, digital 
pixel quantification of PLVAP immunostaining of cirrhotic liver sections (n=10) 
in parenchymal nodules and fibrotic septae. Bottom, ACKR1immunostaining of 
cirrhotic liver sections (n =10) in parenchymal nodules and fibrotic septae. 

d, Flowcytometry analysis of PLVAP, CD34 and ACKR1in endothelial cells from 
healthy (n =3, grey) or cirrhotic (n=7, red) livers. Top, representative 
histograms; bottom, MFI values. e, Flow-based adhesion assay. Peripheral blood 
monocytes assessed for adhesion to primary human liver endothelial cells (top) 
and the percentage of adherent monocytes that transmigrate (bottom); 
endothelial cells isolated from healthy (n=5S) or cirrhotic (n=4) livers. 

f, Endothelial cell gene knockdown. Cirrhotic endothelial cells were treated with 
siRNA against PLVAP (n= 6) or ACKR1 (n=5) or with control siRNA (n= 6). Top, 
representative flowcytometry histograms for stated markers, with comparison 
to isotype control antibody. Bottom, flow-based adhesion assay, with PBMCs 
assessed for adhesion (bottom left) and the percentage of adherent cells that 
transmigrate (bottom right) after siRNA treatment of endothelial cells. g, Top 
left, self-organizing map (60 x 60 grid) of smoothed scaled metagene 


expression of endothelia lineage. In total, 21,237 genes, 3,600 metagenesand 45 
signatures were identified. A-E denote metagene signatures overexpressed in 
one or more endothelial subpopulations. Bottom left, smoothed mean 
metagene expression profile for each endothelial subpopulation. Middle, radar 
plots of metagene signatures A-E showing distribution of signature expression 
across endothelial subpopulations, exemplar genes (middle) and Gene Ontology 
enrichment (right). h, Heat map of endothelial subpopulation transcription 
factor regulon expression (colour-coded by cluster and condition) across 8,020 
endothelial cells from healthy (n= 4) and cirrhotic (n=3) human livers. Exemplar 
regulons are labelled (right). Columns denote cells; rows denote regulons. 

i, ¢SNE visualization of endothelial lineage (8,020 cells from healthy (n= 4) and 
cirrhotic (n= 3) livers), annotating monocle pseudotemporal dynamics (purple 
to yellow; grey indicates lack of inferred trajectory). RNA velocities (red arrows) 
visualized using Gaussian smoothing on regular grid.j, Representative 
immunofluorescence images (n> 3) of RSPO3, PDPN, AIFIL, VWA1or ACKR1 
(red), CD34 (white), PLVAP (green) and DAPI (blue) in healthy and cirrhotic liver. 
Scale bars, 50 um.k, Annotation of 8,020 endothelial cells by subpopulation and 
injury condition. LSEC, liver sinusoidal endothelial cells. Dataare mean+s.e.m. 
Pvalues determined by two-tailed Wilcoxon test (c), two-tailed Mann-Whitney 
test (d, e), Kruskal-Wallis and Dunn test (f), or Fisher’s exact test (g). 
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Extended Data Fig. 9 | Characterization of mesenchymal cells in healthy and 
cirrhotic humanlivers. a, Selected genes expressed in 2,318 mesenchymal 
cells from healthy (n= 4) and cirrhotic (n=3) human livers. b, Clustering 319 
SAMes into two further subclusters. c, Heat map of SAMes subcluster marker 
genes (colour-coded by cluster and condition), with exemplar genes labelled 
(right). Columns denote cells; rows denote genes. d, Fractions of SAMes 
subpopulations in healthy (n= 4) and cirrhotic (n =3) livers. e, f, Representative 
immunofluorescence images (n > 3) of OSR1 (red), collagen 1 (green) and DAPI 
(blue) in portal region of healthy liver (e) or fibrotic niche of cirrhotic liver (f). 
Scale bars, 50 pm. g, Scaled gene expression of selected genes across 2,318 
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mesenchymal cells from healthy (n = 4) and cirrhotic (n= 3) livers. h, ¢SNE 
visualization of 1,178 HSCs and SAMes from healthy (n= 4) and cirrhotic (n=3) 
livers annotated by monocle pseudotemporal dynamics (purpleto yellow). RNA 
velocity field (red arrows) visualized using Gaussian smoothing on regular grid. 
i, Heat map of cubic smoothing spline curves fitted to genes differentially 
expressed across HSC-to-SAMes pseudotemporal trajectories, grouped by 
hierarchical clustering (k=2); colour-coded by pseudotime and condition (top). 
Gene co-expression modules (colour) and exemplar genes are labelled (right). 
Data are mean+s.e.m. Pvalues determined by Wald test (d). 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 | Characterization of the cellular interactomeinthe 
fibrotic niche. a, b, Representative immunofluorescence images (n > 3) of 
fibrotic niche in cirrhotic liver. a, TREM2 (red), PLVAP (white), PDGFRa (green) 
and DAPI (blue). b, TREM2 (red), ACKR1 (white), PDGFRa (green) and DAPI (blue). 
c, Proliferation assay. Human HSCs were treated with conditioned medium from 
primary hepatic macrophage subpopulations SAMac (n= 2), TMs (n=2), KCs 
(n=2) or control medium (n=2). The AUC of the percentage change in HSC 
number over time (hours) is shown on the yaxis. Data are mean +s.e.m.d, Circle 
plot showing potential interaction magnitude from ligands expressed by 
SAMacs and SAEndos to receptors expressed on SAMes.e, Circle plot showing 
potential interaction magnitude from ligands expressed by SAMes to receptors 
expressed on SAMacs and SAEndos. f, Dot plot of ligand-receptor interactions 
between SAMes (n=7 human livers), SAMacs (n=10 human livers) and SAEndos 
(n=7 human livers). Ligand (red) and cognate receptor (blue) shown on thex 
axis; populations that express ligand (red) and receptor (blue) are shown onthey 
axis; circle size denotes Pvalue (permutation test); colour (red, high; yellow, low) 
denotes average ligand and receptor expression levels in interacting 
subpopulations. g, Top, representative immunofluorescence image (n > 3) of 
CCL2 (red), CCR2 (white), PDGFRa (green) and DAPI (blue) in fibrotic nichein 


cirrhotic liver; arrows denote CCL2*PDGFRa‘ cells. Bottom, representative 
immunofluorescence image (n > 3) of ANGPT1 (red), TEK (white), PDGFRa 
(green) and DAPI (blue) in fibrotic niche in cirrhotic liver; arrows denote 
ANGPT1°PDGFRoa‘ cells. h, Circle plot denotes potential interaction magnitude 
from ligands expressed by SAMacs to receptors expressed on SAEndos. i, Dot 
plot of ligand-receptor interactions between SAMacs (n=10 human livers) and 
SAEndos (n=7 human livers) asinf.j, Representative immunofluorescence 
image (n> 3) of TREM2 (red), FLT1 (white), VEGFA (green) and DAPI (blue) in 
fibrotic niche in cirrhotic liver; arrows denote TREM2‘VEGFA‘ cells.k, Circle plot 
of the potential interaction magnitude from ligands expressed by SAEndos to 
receptors expressed onSAMacs. I, Dot plot of ligand-receptor interactions 
between SAEndo (n=7 human livers) and SAMacs (n=10 human livers) as inf. 
m, Top, representative immunofluorescence image (n = 3) of TREM2 (red), 
CD200 (white), CD200R (green) and DAPI (blue) in fibrotic niche in cirrhotic 
liver; arrows denote TREM2*CD200R‘ cells. Bottom, representative 
immunofluorescence image (n = 3) of TREM2 (red), DLL4 (white), NOTCH2 
(green) and DAPI (blue) in fibrotic niche in cirrhotic liver; arrows denote 
TREM2‘*NOTCH2‘ cells. All scale bars, 50 pm. 
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A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 
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[| Estimates of effect sizes (e.g. Cohen's d, Pearson's r}, indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection nitial processing of single-cell RNA-sequencing data was performed using the commercial CellRanger pipeline (10X Genomics, version 
2.1.0, see Methods). Subsequent analysis was performed using the open-source R programming language (version 3.4.1). BD FACS 
Sortware software was used for cell sorting on BD Influx equipment. BD FACS Diva software was used for flow cytometry on BD LSR 
Fortessa equipment and for cell sorting on BD FACSAria Fusion and FACSAriall. Fluorescent and brightfield microscopy images were 
acquired using Zen Blue software (Zeiss) on an Axioscan.Z1 instrument (Zeiss) or Confocal Microscope Zeiss LSM780. Luminex data was 
acquired on a Bio-Plex 200 (Bio-Rad). Cell Proliferation data was acquired on an Incucyte ZOOM live cell analysis system (Essen 
biosciences). RT-qPCR data was acquired on ABI 7900HT FAST PCR system (Applied Biosystems). 


Data analysis mmunoflurorescent images were processed and analysed using Zen Blue software (Zeiss) and Fiji software (ImageJ version 2.00). Cell 
proliferation data were analysed on the Incucyte proprietary analysis software (version 2018A). Immunohistochemistry images were 
analysed using QuPAth software (version 0.1.2) for automated cell counting and using Fiji software (ImageJ version 2.00) with Trainable 
Weka Segmentation plugin (see Methods). Co-culture immunocytochemistry data was analysed using Imaris x64 (version 8.1.2). Flow 
cytometry analysis was performed using FlowJo software (version 10.2). RT-qPCR data was analysed using ThermoFisher Connect cloud 
qPCR software (version 2019.1.8-Q1-19-build4). Statistical analysis was performed either in R (version 3.4.1) or using Graphpad Prism 
software (version 7.0a). Single-cell RNA-sequencing analysis was performed in R, based around the following packages: Seurat 2.3.0, 
sclmpute 0.0.8, SCRAT 1.0.0, monocle 2.6.1, scater 1.4.0, velocyto 0.6.0, SCENIC 0.1.7 (see Methods). We also made use of the 
CellPhoneDB repository of ligands, receptors, and interactions. Deconvolution was performed using Cibersort. Gene Ontology 
enrichment analysis was performed using PANTHER 13.1. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 
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Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 
- Accession codes, unique identifiers, or web links for publicly available datasets 


- A list of figures that have associated raw data 
- A description of any restrictions on data availability 


Our expression data will be freely available for user-friendly interactive browsing online at www.livercellatlas.mvm.ed.ac.uk. CellPhoneDB is available at 
www.CellPhoneDB.org, along with lists of membrane proteins, ligands and receptors, and heteromeric complexes. All raw sequencing data have been deposited in 
the Gene Expression Omnibus (GEO Accession GSE136103). We make available as Supplementary Tables: lists of lineage-specific genes for signature analysis 
(Extended Data Figure 1e, 2b); lists of marker genes and regulons from clustering results (Figure le, 2d, 4c, 5b, Extended Data Figure 3d, e, g, 5h, 6c, 8h, 9c); lists of 
module / signature genes from trajectory and self-organising map analyses and corresponding lists of gene ontology terms from enrichment analysis (Figure 3c, d, 
Extended Data Figure 5a, b, e, f, 8g); lists of significant interactions in the fibrotic niche as identified using CellPhoneDB (Figure 6a, e, Extended Data Figure 10f, i, I). 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size In total, we present scRNA-seq data from ten human liver samples (n=5 healthy and n=5 cirrhotic), five human blood samples (n=4 cirrhotic 
and n=1 healthy named PBMC8K; pbmc8k dataset sourced from single-cell gene expression datasets hosted by 10X Genomics) and two 
mouse samples (n=3 healthy and n=3 fibrotic). No statistical methods were used to predetermine sample size. Patient number was selected to 
give a balanced representation of healthy and cirrhotic liver cells and to provide sufficient cells of each lineage to facilitate more detailed 
analysis. Histology, flow cytometry, luminex, RT-qPCR and cell proliferation analysis were performed on multiple independent biological 
replicates (n shown in figure legends). 


Data exclusions Described in detail in Methods. Exclusion criteria were determined following initial assessment and QC of the data. Low gene expression 
(fewer than 300 genes) or a high mitochondrial gene content (>30% of the total UMI count) are indicators of outlier low quality cells and 


were excluded. At each stage of the analysis we used signature analysis to identify and exclude potential doublet clusters. 


Replication All experimental findings reported here were successfully replicated across multiple biological samples (n reported in each figure legend). All 
immunofluorescence was performed on a minimum of 3 liver samples to identify representative images. 


Randomization One group of randomly selected healthy livers and another group of randomly selected cirrhotic livers were analysed in this study. All 
subsequent analyses were performed in randomly selected healthy or cirrhotic liver samples. For mouse experiments, age-matched littermate 


mice were randomly assigned to be healthy controls or receive carbon tetrachloride. 


Blinding Blinding to the origin of the tissue samples was not possible. All analyses were performed in an automated manner across conditions. 
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system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 
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Antibodies 


Antibodies used All antibodies used in this work, clone, application, supplier and lot number are listed in Supplementary Table 19. 


Validation All antibodies used are commerically available and validated by the vendor for the assay and species used in this study. Specific 
validation information for each antibody is available on the vendors website. 
The specificity of each primary flow cytometry antibody was validated by staining directly against species-matched isotype and 
unstained controls. 
Validation of each primary antibody used for immunostaining was performed by comparison to species-matched isotype 
antibodies and unstained controls 
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Laboratory animals Male C57BL/6JCrl mice aged 8 to 10 weeks 

Wild animals Study did not involve wild animals 

Field-collected samples Study did not involve samples collected in the field 

Ethics oversight All experiments were performed in accordance with UK Home Office regulations. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics Please see Extended Data Figure 1a for the clinical characteristics of patients used for single-cell RNA sequencing. 


Recruitment Patients were recruited as described in Methods. Healthy background non-lesional liver tissue was obtained intraoperatively 
from patients undergoing surgical liver resection for solitary colorectal metastasis at the Hepatobiliary and Pancreatic Unit, 
Department of Clinical Surgery, Royal Infirmary of Edinburgh. Patients with a known history of chronic liver disease, abnormal 
liver function tests or those who had received systemic chemotherapy within the last four months were excluded from this 
cohort. Cirrhotic liver tissue was obtained intraoperatively from patients undergoing orthotopic liver transplantation at the 
Scottish Liver Transplant Unit, Royal Infirmary of Edinburgh. Blood from patients with a confirmed diagnosis of liver cirrhosis 
were obtained from patients attending the Scottish Liver Transplant Unit, Royal Infirmary of Edinburgh. Patients with liver 
cirrhosis due to viral hepatitis were excluded from the study. For cell sorting of macrophages or isolation of human endothelial 
cells, liver tissue was acquired from explanted diseased livers from patients undergoing orthotopic liver transplantation, resected 
liver specimens or donor livers rejected for transplant at the Queen Elizabeth Hospital, Birmingham. 


Ethics oversight NRS BioResource and Tissue Governance Unit (Study Number SR574), following review at the East of Scotland Research Ethics 
Service (Reference 15/ES/0094) 
For University of Birmingham samples, separate local ethical approval was obtained (Reference 06/Q2708/11, 06/Q2702/61). 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 
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Plots 
Confirm that: 
Xx] The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


x The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a ‘group’ is an analysis of identical markers). 


xX All plots are contour plots with outliers or pseudocolor plots. 


xX A numerical value for number of cells or percentage (with statistics) is provided. 
Methodology 
Sample preparation Please see Methods for detailed sample preparation protocol for FACS and flow cytometry 
Instrument BD Influx and BD FACSAriall were used for cell sorting at University of Edinburgh. BD LSR Fortessa was used for flow cytometry 
analysis. BD FACSAria Fusion for cell sorting at the University of Birmingham 
Software BD FACS Sortware software was used for cell sorting on BD Influx equipment. BD FACS Diva software for cell sorting on BD 


FACSAriall and BD FACSAria Fusion. BD FACS Diva software was used for flow cytometry on BD LSR Fortessa equipment. Flow 


cytometry analysis was performed using FlowJo software (version 10.2). 


a 


Cell population abundance Sort purity was routinely over 95% on post-sort chec 


Gating strategy Please see Methods and Extended Data Figures 1b,c, 4f and 6e for gating strategies. Initial gating for all experiments: Cells (FSC-A 
vs SSC-A), Singlets (FSC-A vs FSC-H (or FSC-A vs TPW for BD Influx)), Viable (SSC-A vs viability dye (See methods)). For human 
PBMC sort, CD45+ CD66b- cells were sorted. For human liver single-cell RNA-seq sorting, CD45+ cells (leukocytes) or CD45- cells 
(other NPCs) were sorted. For human liver macrophage flow cytometry quantification and cell sorting, tissue monocyte- 
macrophages were identified as CD45+, Lin- (CD3, CD335, CD19, CD66b, LILRA4, CD326), HLA-DR+, CD1C-, CD14+ and/or CD16+ 
cells. SAM were then identified as CD163- TREM2+ CD9+, KCs were identified as CD163+ CD9- and TMo were identified as 
CD163-. For mouse liver single-cell RNA-seq sorting, tissue mononuclear phagocytes identified as CD45+ Lin-(CD3, NK1.1, Ly6G, 
CD19) cells were sorted. For mouse liver macrophage cell sorting, CD45+ Lin- CD11b+ F4/80+ TIMD4- CD9+ or CD9- cells were 
sorted from CCl4-treated mice. For human liver endothelial cell flow cytometry, cultured endothelial cells were stained with 
antibodies to PLVAP, ACKR1, JAG1 and CD34. Gates and boundaries were defined by comparison to FMO and unstained samples. 


xX Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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Immunosuppression increases the risk of cancers that are associated with viral 
infection’. In particular, the risk of squamous cell carcinoma of the skin—which has 
been associated with beta human papillomavirus (B-HPV) infection—is increased by 
more than 100-fold in immunosuppressed patients? *. Previous studies have not 
established a causative role for HPVs in driving the development of skin cancer. Here 
we show that T cell immunity against commensal papillomaviruses suppresses skin 
cancer inimmunocompetent hosts, and the loss of this immunity—rather than the 
oncogenic effect of HPVs—causes the markedly increased risk of skin cancer in 
immunosuppressed patients. To investigate the effects of papillomavirus on 
carcinogen-driven skin cancer, we colonized several strains of immunocompetent 
mice with mouse papillomavirus type 1 (MmuPV1)°. Mice with natural immunity 
against MmuPV1 after colonization and acquired immunity through the transfer of 
T cells from immune mice or by MmuPVI1 vaccination were protected against skin 
carcinogenesis induced by chemicals or by ultraviolet radiation ina manner 
dependent on CD8* T cells. RNA and DNA in situ hybridization probes for 25 
commensal B-HPVs revealed a significant reduction in viral activity and load inhuman 
skin cancer compared with the adjacent healthy skin, suggesting a strong immune 
selection against virus-positive malignant cells. Consistently, E7 peptides from 
B-HPVs activated CD8° T cells from unaffected human skin. Our findings reveal a 
beneficial role for commensal viruses and establish a foundation for immune-based 
approaches that could block the development of skin cancer by boosting immunity 
against the commensal HPVs present in all of our skin. 


Cutaneous squamous cell carcinoma (SCC) is the second-most-com- 
mon type of cancer and is associated with severe morbidity and mortal- 
ity—especially among immunosuppressed patients such as recipients 
of organ transplants”. Although ultraviolet (UV) radiation is the main 
and preventable cause of skin cancer, the incidence of skin cancer inthe 
United States has doubled from 1992 to 2012’, highlighting the urgent 
need to develop new approaches for the prevention and treatment 
of this disease. Given that B-HPVs have been found in more than 80% 
of SCCs among recipients of organ transplants, a potential viral cause 
of skin cancer has been proposed? *. However, unlike high-risk «-HPVs, 
no predominant types of B-HPV have been identified in skin cancers, 
and the B-HPV genomeis rarely integrated into the DNA of cancer cells 
and is not transcriptionally active®. Findings like these have led toa 
‘hit-and-run’ hypothesis, in which B-HPV facilitates the initiation of 
UV-driven skin cancer but is later lost during tumour maintenance*’. 

To investigate the role of papillomavirus in carcinogen-driven skin 
cancer, we used a MmuPVI1 back-skin infection system, which led tothe 


development of confluent warts in Cd4’ Cd8“ (Cd8 is also known as 
Cd8a) mice but no skin lesions in immunocompetent, wild-type mice 
(Extended Data Fig. la—c). Two months after infection, MmuPV1- and 
sham-infected mice of the C57BL/6) strain were subjected to a chemi- 
cal carcinogenesis protocol for 30 weeks. Notably, MmuPV1-colonized 
mice showeda significant delay in the onset of skin tumours, developed 
significantly fewer tumours over time and completed the study with 
a significantly lower tumour burden, compared with sham-infected 
controls (Fig. la—c, Extended Data Fig. 1d). In the FVB strain, 23% of 
MmupPVI1-infected wild-type mice showed complete immunity five weeks 
after infection (that is, no skin warts; Fig. 1d, Extended Data Fig. le). At 
ten weeks after infection, warts had completely regressed in 58% of the 
wart-bearing mice—indicative of antiviral adaptive immunity. Adoptive 
transfer of memory T cells from mice that are immune to MmuPV1 into 
Cd4""Cd8 mice led to fewer warts in these mice after MmuPVI1 back- 
skin infection than were observed in control T cell-deficient mice and 
Cd4'Cd8s‘ mice that received memory T cells from wild-type mice 
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(b, c);*P< 0.05. d, Representative images of mice in DMBA-UV-treated 
cohorts. Note the resemblance of DMBA-UV-induced skin tumours to 
actinic keratosis and SCC in humans. Scale bar, 1cm.e, Representative 
images of CD8*T cells inthe skin of MmuPV1/DMBA-UV and sham/ 
DMBA-UV mice at the completion of the carcinogenesis protocol. Arrows 
indicate CD8* T cells in the epidermis; dashed lines highlight the 
epidermal basement membrane. Scale bar, 100 pm. f, CD8* T cell 
infiltrates in MmuPV1/DMBA-UV (n=10) and sham/DMBA-UV (n= 9) 
skin, quantified in ten randomly selected high-power field (HPF) images 
per mouse and averaged across the mice in each group (two-tailed 
unpaired t-test). g, The ratio of epidermal CD8’ T cells (that is, CD8* Tay 
cells) to total T cells in each HPF image calculated across MmuPV1/ 
DMBA-UV (n=10) and sham/DMBA-UV (n= 9) groups (two-tailed 
unpaired t-test). Each dot represents one high-power image. Stained cells 
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Fig. 3 |Reduced B-HPV transcripts in skin cancer cells and presence of B-HPV- 
specific CD8' T cellsin healthy human skin indicates a selective pressure 

by antiviral immunity against malignant cells with active HPV. 

a, Representative SCC sections from immunosuppressed and immunocompetent 
patients, stained with haematoxylin and eosin (H&E) or by B-HPV RNAISH (red 
dots). The wart sample serves as a positive control and exhibits the greatest 
amount of B-HPV activity. Hypertrophic actinic keratosis (HAK) arising in 
association with a wart (HAK in verruca) is another example of aB-HPV-active 
lesion found onthe skin of immunosuppressed patients. Insets highlight the 
representative areas of the cancer or wart and their adjacent normal skin. Scale 
bars, 100 um. b, B-HPV RNAISH signals quantified in paired samples of skin cancer 
and its adjacent normal skin, collected from immunosuppressed (n=38) and 


treated with parvovirus vaccine (Extended Data Fig. 2a, b). T cells from 
MmuPV1-immune mice also provided immunity (thatis, wart rejection) 
to wild-type FVB mice with persistent warts (Fig. le). MmuPV1-colo- 
nized immune FVB mice that received 7,12-dimethylbenz[a]Janthracene 
(DMBA) and 12-O-tetradecanoylphorbol-13-acetate (TPA) for 20 weeks 
were protected against chemical carcinogenesis compared with sham- 
infected mice (Fig. 1f-i). Furthermore, mice with acquired immunity after 
Tcelltransfer were also protected from chemical carcinogenesis (Fig. 1j). 
The MmuPVI specificity of the transferred T cells from MmuPV1-immune 
mice was further substantiated by their inability to protect against the 
growth of SCC cells that were not infected with MmuPV1 (Extended Data 
Fig. 2c). Atthe completion of the carcinogenesis studies, MmuPV1 viral 
DNA and anti-MmuPV1 antibodies were detectable in the normal skin 
and blood, respectively, of MmuPV1-colonized mice (Extended Data 
Fig. 3a—d). Although there was no change in overall levels of immune 
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immunocompetent (n=32) patients (two-tailed paired t-test). Skin cancer 
characteristics are listed in Supplementary Table 2.c,d, Representative flow 
cytometry plots (c) and quantification (d) of CD69*CD8* and CD137*CD69*CD8* 
Tcells isolated from human facial skin and used ina B-HPV peptide stimulation 
assay (n=6 biological replicates for each treatment condition). T cells were 
isolated from 8 samples of facial skin (6 males and 2 females; average age 75; age 
range 60-89). The percentage of CD8’ T cells in each quadrant is shown onthe 
flowcytometry plots. Stimulation with phorbol-12-myristate-13-acetate and 
ionomycin (PMA/ion.) was used asa positive control. For details of the peptide 
pool, see Methods and Supplementary Table 3. Two-tailed Mann-Whitney Utest; 
*P<0.05,**P< 0.01, NS, not significant compared with negative control. RNAISH 
signals were counted blindly. Data are mean +s.d. (d). 


cell infiltrates, MmuPV1-colonized skin had an increased ratio of CD8* 
tissue-resident memory T (Tay) cells in the epidermis to total T cells in 
the skin (Extended Data Fig. 3e-h). DMBA-TPA-induced skin tumours 
in MmuPV1-colonized mice showed similar proliferative and mutational 
signatures to those in sham-infected mice and lacked MmuPV1I viral 
transcripts (Extended Data Fig. 3i-k). 

To examine the effect of MmuPV1 on carcinogenesis that is driven 
by UV radiation, MmuPV1 back-skin infection was performed in immu- 
nocompetent SKH-1 mice (Extended Data Fig. 4a). MmuPV1-infected 
immune mice§ that received a single immunosuppressive dose of 
ultraviolet light B (UVB; 300 mJ cm”) at three months after MmuPV1 
infection developed warts’, indicating the long-term persistence of 
MmuPV1 colonization of the skin (Extended Data Fig. 4b, c). To avoid 
immunosuppressive UV exposure, MmuPVI1- and sham-infected mice 
were treated with DMBA a week before undergoing treatment with UVB 
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(100 mJ cm”) three times a week for 25 weeks. MmuPV1-infected SKH-1 
mice developed significantly fewer tumours over time and had a mark- 
edly lower tumour burden at the completion of the study compared to 
sham-infected controls (Fig. 2a—d). A small subset of SKH-1 mice that 
had persistent warts two months after MmuPVI1 back-skin infection® 
were vaccinated with MmuPVI live virus particles intraperitoneally 
three times over a two-week period. Four weeks later, five out of nine 
mice developed immunity against MmuPV1, as demonstrated by the 
rejection of their persistent warts (Extended Data Fig. 4d). The mice with 
acquired immunity against MmuPV1 developed markedly fewer skin 
tumours compared with the non-immune mice (P= 0.0159; Extended 
Data Fig. 4d, e). We detected a significant increase in the total number 
of CD8* T cells and the ratio of epidermal CD8* T,, cells to total T cells in 
the skin of MmuPV1-colonized mice compared with their sham-infected 
controls at the completion of the UV carcinogenesis protocol (Fig. 2e-g). 
Furthermore, the total numbers of T cells and CD8* T cells were markedly 
increased in skin tumours of MmuPV1-colonized mice (Extended Data 
Figs. 4f-m, 5a—-c). The levels of skin and tumour-infiltrating CD3°CD45* 
leukocytes and CD4*T cells were unchanged between the two groups 
(Extended Data Figs. 4f-m, 5d-f). 

To determine the role of CD8° T cells in mediating the anti-tumour 
immunity induced by papillomavirus skin colonization, SKH-1 mice 
were infected with MmuPVI1 or sham-infected with MmuPVI virus- 
like particles (Sham(VLP)). MmuPVI1- and sham(VLP)-infected mice 
underwent CD8’ T cell depletion, mediated by anti-CD8 antibodies, 
together with the UV carcinogenesis protocol (Extended Data Fig. 5g, 
h). Notably, MmuPV1-colonized SKH-1 mice that were treated with IgG 
control developed markedly fewer tumours compared to the MmuPVI- 
colonized mice that underwent T cell depletion, and compared with 
both thelgG- and anti-CD8-antibody-treated control groups that were 
infected with sham(VLP) (Extended Data Fig. 5i,j). Consistent with our 
findingsin otherimmunocompetent strains of mice, MmuPV1-colonized 
Xpc* mice—which are deficient in the ability to repair UV-induced DNA 
mutations’°—were protected from skin cancer compared to their sham- 
infected controls (Extended Data Fig. 5k-n). 

To determine whether B-HPVs have a similarly protective role in 
humanskin, we used B-HPV RNA in situ hybridization (RNAISH) to simul- 
taneously detect the E6/7 transcripts of 25 types of B-HPV in human 
tissue sections (Extended Data Fig. 6). In contrast to skin lesions from 
an immunosuppressed patient, expression of B-HPV RNA was largely 
absentin the cancer cells of aSCC from an immunocompetent patient 
(Fig. 3a). Expression of B-HPV RNA was significantly reduced in cancer 
cells compared to adjacent normal skin keratinocytes among immuno- 
competent and immunosuppressed patients (Fig. 3b). The skin lesions 
of immunosuppressed patients had significantly higher B-HPV viral 
transcripts compared to skin lesions and samples of normal facial skin 
fromimmunocompetent patients (Extended Data Fig. 7a—e). B-HPV DNA 
in situ hybridization (DNA ISH) probes for 25 types of B-HPV (Extended 
Data Fig. 7f) detected higher viral load in an SCC from an immunosup- 
pressed patient compared toanSCC from animmunocompetent patient 
(Extended Data Fig. 8a). B-HPV viral load was reduced in cancer cells 
compared to the adjacent normal skin of immunosuppressed patients 
(Extended Data Fig. 8b), and this reduction was more pronounced in 
the lesions of immunocompetent patients (Extended Data Fig. 8c). The 
higher viral activity and load in the skin cancers of immunosuppressed 
patients correlated with significantly fewer tumour- and skin-infiltrating 
CD8* T and CD103*CD8* Tay, cells in their skin cancers compared with 
samples from immunocompetent patients (Extended Data Fig. 9a-c). 
Notably, B-HPV E7 peptides activated CD8" T cells isolated from the 
normal facial skin of immunocompetent adults (Fig. 3c, d, Extended 
Data Fig. 9d). By contrast, high-risk HPV16 E7 peptides did not activate 
skin-derived CD8*' T cells (Fig. 3c, d, Extended Data Fig. 9d). 

To identify the signals that lead to papillomavirus antigen presenta- 
tion to T cells after abnormal proliferation of keratinocytes, we per- 
formed RNA sequencing (RNA-seq) onskin warts, MmuPV1-infected 
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DMBA-UV-treated skin and tumours, and sham-infected DMBA-UV- 
treated skin and tumours of SKH-1 mice (Extended Data Fig. 10a-c). 
Among the 20 genes that were upregulated in both MmuPV1-induced 
warts and DMBA-UV-induced tumours (from both MmuPVI- and 
sham-infected groups) compared with skin (also from both groups), 
there were several immune-related genes—including the damage- 
associated molecular pattern (DAMP) genes S100a8 and S100a9 
(Extended Data Fig. 10c). In human SCCs and warts, we confirmed 
the induction of S100 genes compared with normal skin and with seb- 
orrheic keratosis, a benign skin growth, in which SIOOA8 and SIOOA9 
genes were downregulated compared with normal skin (Extended 
Data Fig. 10d-f). 

The findings presented herein reveal a previously unrecognized role 
for commensal HPVs in cancer development. Using the colonization of 
skin by papillomavirus as amodel, weshowthat MmuPVI1-infectedimmu- 
nocompetent mice are protected against skin cancer that is induced by 
chemicals or UV radiation, ina CD8* T cell-dependent manner. Although 
specific-pathogen-free (SPF) mice may not fully reproduce the complex 
microbiome of human skin, our findings strongly suggest that antivi- 
ral adaptive immune responses define the role of papillomaviruses in 
skin carcinogenesis. Our discovery of B-HPV-specific CD8°T cells in the 
normal human skinis indicative of an adaptive immunity that is primed 
against commensal HPVs in healthy adults at baseline. These T cells that 
reside inthe skin can target keratinocytes with active virus during their 
abnormal proliferation to form a wart or a skin cancer. Accordingly, 
T cell-based vaccines against commensal HPVs may provide an inno- 
vative approach to boost this antiviral immunity in the skin and help 
prevent warts and skin cancers in high-risk populations. In addition, 
increasing anti-HPV immunity may improve the efficacy of immune 
checkpoint blockade therapy against SCC”. Giventheemerging diversity 
of the skin virome”, it is critical to characterize the viral communities 
that reside in the skin of immunocompetent and immunosuppressed 
individuals and determine how these viruses contribute to human health 
and disease. 
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Methods 


Human tissue studies 

Discarded de-identified human tissue samples were obtained through 
Mohssurgery clinics and the pathology department at Massachusetts 
General Hospital. The skin lesions and normal skin samples were pro- 
cessed for immune cell or RNA isolation, or obtained as formalin-fixed 
paraffin-embedded sections for histological assays. 


Animal studies 

All mice were housed under pathogen-free conditions in the animal 
facilities at Massachusetts General Hospital and the University of Lou- 
isville in compliance with animal care and all relevant ethical regula- 
tions. Six-to-ten-week-old female C57BL/6J mice (Jackson Laboratory; 
000664), female FVB mice (Charles River; 207), female SKH-1 Elite mice 
(Charles River; 477) and male and female Xpc" mice (Jackson Labora- 
tory; 010563) were used in the immunocompetent arms of this study. 
Female Cd4Cd8“ mice in the FVB background were used as T cell- 
deficient hosts (provided by D. G. DeNardo) (Cd8“: Jackson Labora- 
tory; 032563). Age- and gender-matched groups of mice were used in 
all experiments. Wherever possible, mice were randomized into test 
versus control groups and power analysis was used to determine the 
optimal number of mice in each group. Intumour studies, the onset of 
skin tumours and tumour counts were recorded from the time of DMBA 
treatment (week O) and the maximum tumour diameter allowed was 
2cm.MmuPV1-infected mice were housed in a biocontainment unit in 
an animal facility at University of Louisville in accordance with animal 
care regulations. 


Statistics and reproducibility 

Atwo-tailed Mann-Whitney Utest was used fortumour counts and T cell- 
activation assays. A two-tailed paired t-test was used for comparing RNA 
ISH and atwo-tailed Wilcoxon matched-pairs signed-rank test for com- 
paring DNAISH signal counts between skin cancers and their adjacent 
normal skin. A two-tailed unpaired t-test was used for immunostained 
cell counts, RNA ISH signal counts comparing skin lesions to normal 
human skin and other continuous variables. A log-rank test was used 
as the test of significance for the time to tumour onset outcomes and 
atwo-tailed Fisher’s exact test for skin cancer anatomical distribution 
outcomes. Pearson’s x’ tests were used for other categorical variables. 
A P value of less than 0.05 was considered significant. All bar graphs 
and dot plots show either mean+s.d. or mean +s.d., as indicated. Data 
are representative of at least two independent sets of experiments with 
similar results. 


Study approval 

Analysis of de-identified samples of human tissue was reviewed and 
approved by Massachusetts General Hospital Institutional Review Board 
(IRB). Massachusetts General Hospital and University of Louisville Insti- 
tutional Animal Care and Use Committee (IACUC) approved the animal 
studies. 


Purification of MmuPV1 

MmuPVI1 viral stock was purified from MmuPV1-induced muzzle warts 
of B6.Cg-Foxn1™/Foxn1™ mice using the caesium chloride gradient 
method following a protocol described previously”. In brief, muzzle 
warts of B6.Cg-Foxn1™/Foxn1™ mice were homogenized by pulveriza- 
tion witha mortar and pestle in liquid nitrogen, and then homogenized 
with a tissue grinder (DWK Life Sciences; 885450-0023). The tissue was 
then subjected to three freeze-thaw cycles between liquid nitrogen 
and a 37 °C water bath, and sonicated for two minutes (amplitude of 
20,10-s pulse). Caesium chloride (Sigma-Aldrich; 289329) dissolved in 
phosphate-buffered saline (PBS) was added to the wart homogenate 
for a final density of 1.3623 g ml", determined using a refractometer 
(product discontinued). The tissue was ultracentrifuged overnight 


at 36,000 r.p.m and opaque bands at densities ranging from 1.27 to 
1.31 g mI" were extracted. Extracted bands were dialysed three times 
for eight hours using Slide-A-Lyzer cassette (VWR; P166230) in31PBS. 
The purity of the viral preparation was confirmed using SDS-PAGE . 


MmuPV1inoculation 

Back skin of the wild-type, Xpc“ and Cd4’'Cd8“ mice was shaved with 
an electric razor and waxed. Next, skin was scarified using a nail file for 
10-20 passages across the skin to generate microaberrations inthe skin 
barrier, which was accompanied by skin erythema. Purified virus inocu- 
lum (20 pl) was pipetted onto scarified skin and spread homogenously. 
The same viral inoculum was used for all infected mice, which yielded the 
development of confluent warts on the back skin of T cell-deficient FVB 
mice. Sham-infected mice received 20 pl sterile normal saline topically 
after skin aberration. Vaseline gauze (McKesson; 61-20056) was cut to 
fit the site of the injury and applied under a standard adhesive bandage. 
Meloxicam (0.5 mg kg", Boehringer Ingelheim Vetmedica) was injected 
subcutaneously for pain relief and again the next day. Bandages were 
removed 48 h after inoculation and 200 pI sterile normal saline was 
injected subcutaneously to any lethargic mice. 


PCR detection of MmuPV1 in mouse skin 

Toconfirm skin colonization after MmuPV1 back-skin infection and at 
the completion of carcinogenesis protocols, DNA was isolated from 
the skin biopsies using the DNeasy Blood & Tissue Kit (Qiagen; 69506). 
PCR amplification of the MmuPV1L/ gene was performed following a 
previously described method® (primers are listed in Supplementary 
Table 4b). 


Wart development 

For ten weeks following viral infection or sham infection, mice were 
monitored for the development of warts. As previously described", mice 
with warts that lasted for longer than two months were considered to 
have ‘persistent’ warts. We classified these mice as ‘non-immune’ and 
they were subjected to T cell transfer or MmuPVI1 vaccination before 
entering the chemical and UV carcinogenesis studies. Mice that showed 
either no wart development or spontaneous wart rejection were classi- 
fied as ‘immune’ and entered into carcinogenesis studies. MmuPVI1 vac- 
cination in wart-bearing SKH-1 mice was performed by intraperitoneal 
injection of MmuPVI virus inoculum in 200 pl sterile PBS three times 
over two weeks. 


Isolation and transfer of T cells 

MmuPVI1-colonized FVB mice that never developed warts or exhibited 
spontaneous regression of warts by ten weeks following infection 
(immune mice) were used as T cell donors. A single-cell suspension of 
CD4* and CD8* T cells from skin-draining lymph nodes was prepared 
using the EasySep Mouse T Cell Isolation Kit (StemCell Technologies; 
19851). To assess the MmuPVI1-specific nature of T cells from MmuPV1- 
colonized immune mice, we transferred their sorted CD4* and CD8* 
T cells from skin-draining lymph nodes into Cd4/"Cd8~ recipients. 
Donor mice were injected intravenously with 2 ug CD45-APC (BioLeg- 
end; 103112) three minutes before collection to exclude any circulating 
T cells. At collection, single-cell suspensions of skin-draining lymph 
nodes were stained with CD3e-PE-Cy7 (Biolegend; 100320), CD4- 
APC-Cy7 (Biolegend; 100414), CD8a-FITC (Biolegend; 100706) and 
CD62L-PerCP/Cy5.5 (Biolegend; 104432) (Supplementary Table 4a). 
Sorted CD45 CD3'CD4*CD62L'™ and CD45 CD3*CD8*CD62L'™ donor 
memory T cells* were injected intravenously into Cd4’"Cd8* mice at 
129,600 cells per mouse (6:1 CD4*:CD8*' ratio) in 200 ul sterile normal 
saline. As acontrol for MmuPV1-specific T cells, a group of wild-type 
FVB mice were vaccinated against an unrelated virus (mouse parvo- 
virus type 1) in parallel with MmuVP1-infected T cell donor mice to 
propagate a population of T cells that would not respond to MmuPVI1. 
This group of T cell donors was vaccinated with a cocktail of 50 pg 
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polyinosinic—polycytidylic acid (poly(I:C); Sigma-Aldrich; P1530) 
combined with mouse parvovirus virus-like particles (VLPs) in 200 pl 
sterile normal saline delivered by subcutaneous injection at four sites 
(50 pl per site per vaccination) on the back skin at 30 days and 3 days 
before T cell transfer. A total of 200 pl of 5% imiquimod (Sigma-Aldrich; 
1338313) dissolved in dimethyl sulfoxide (DMSO) and diluted in 100% 
EtOH (Sigma-Aldrich; 276855) was applied topically after each vac- 
cination. T cell recipients (T cell-deficient Cd4"Cd8“ and wild-type 
FVB mice) were infected with MmuPVI two days after T cell transfer, 
including mice that received T cells from parvovirus vaccine, and topi- 
cal imiquimod-treated donors. Another subgroup of MmuPVI1T cell 
recipients, including T cell-deficient Cd4’Cd8“ and wild-type mice, 
received an injection of DMBA-TPA-induced primary SCC cells into 
their right flank and were monitored for tumour growth (Extended 
Data Fig. 2a). Mice were monitored closely for wart development in 
MmuPV1 infection cohorts and SCC growth in tumour cohorts for 
two months, including photographs and tumour-size measurements. 
To examine the presence or absence of T cells in the recipient mice, 
peripheral blood was collected from the mice three weeks after the 
T cell transfer. Around 2-3 drops of blood per mouse, extracted from 
the submandibular vein, was collected in 10 ml RBC lysis buffer (Biole- 
gend; 420301), stained with CD3e-PE-Cy7, CD4—APC-Cy7 and CD8a- 
FITC, and examined by flow cytometry. One million T cells in 200 ul 
sterile normal saline from skin-draining lymph nodes of MmuPVI1- 
colonized (immune) mice versus naive T cells were injected intra- 
venously into the tail vein of wart-bearing (non-immune) wild-type 
FVB mice. The recipient mice were monitored for the resolution of 
their skin warts and their response to skin chemical carcinogenesis. 


Chemical carcinogenesis 

After infection and evidence of immunity to MmuPVI, C57BL/6J and 
FVB mice underwent askin chemical carcinogenesis protocol. All mice 
were shaved and seven days later received a single dose of 100 pg DMBA 
(Sigma-Aldrich; D3254) in 200 pl acetone on the back skin. One week 
later, treatments with 6 pg TPA (Sigma-Aldrich; P1585) dissolvedin 200 pl 
acetone were initiated (three times per week for 30 weeks in C57BL/6J 
and two times per week for 20 weeks in FVB cohorts). Throughout the 
carcinogenesis protocol, tumours were counted every week and photo- 
graphs were collected every other week. The final tumour burden was 
determined based on the total number of palpable skin lesions that had 
developed on the back skin of the mice. 


UV carcinogenesis 

Following infection and evidence of MmuPV1 immunity, SKH-1 and 
Xpc’ mice underwent a UV skin carcinogenesis protocol. Mice 
received a single dose of 50 yg (SKH-1) or 100 ppg (Xpc) DMBA in 
200 ul acetone on the back skin. One week later, mice received nar- 
row-band ultraviolet B (UVB) (302-312 nm) 3 times weekly for up 
to 25 weeks (SKH-1) or 30 weeks (Xpc) via a UVP Black-Ray Lamp 
UVB (VWR; 36575-052), which was periodically calibrated using an 
International Light IL1400A Digital Light Meter (International Light 
Technologies). Mice received 100 mJ cm? UVB at each UV treatment 
time point. This is considered a sub-erythemic dose for a fair-skinned 
individual of average tanning ability (Fitzpatrick skin types I and II), 
which approximates to 25-50 min of sun exposure in Florida at midday 
inthe summer”, Throughout the carcinogenesis protocol, tumours 
were counted every week and photographs were collected every other 
week. Any palpable discrete lesion that was discontinuous or separate 
from other lesions was considered a tumour. Tumour counts were 
performed by a single individual to maintain consistency from week 
to week. The final tumour burden was determined on the basis of the 
total number of palpable discrete skin lesions that had developed on 
the back skin of the mice after DMBA treatment. For the immunosup- 
pressive UV-dosing experiment, SKH-1 mice received 300 mJ cm” UVB 
on their back skin once. 


CD8*° T cell depletion 

SKH-1 mice were infected with MmuPVI1 or sham-infected with MmuPV1 
VLPs (L1Met30; 105 pg in 40 pl PBS per mouse’) applied to their 
abraded back skin. Four weeks later, MmuPV1-infected immune mice 
and sham(VLP)-infected controls were started on anti-CD8 (rat anti- 
mouse CD8a; BioXCell; YTS 169.4) or IgG (rat isotype control; Sigma- 
Aldrich) antibody treatment at 750 pg in 200 pl sterile PBS (first dose) 
followed by 250 pg in 200 pl sterile PBS weekly by intraperitoneal injec- 
tion (Extended Data Fig. 5g, Supplementary Table 4a). One day after 
the first antibody treatment, mice underwent the UV carcinogenesis 
protocol as described above. 


Hras-mutation-specific PCR 

After the carcinogenesis protocol, DNA was extracted from tumours and 
skinof MmuPVI1-infected, sham-infected or untreated wild-type FVB mice 
using the DNeasy Blood & Tissue Kit (Qiagen; 69506). Mutation-specific 
primers were designedas previously described with the addition ofawild- 
type specific primer’ (primers are listed in Supplementary Table 4b). 
PCR was performed using 500 ng of genomic DNA, 12.5 pmol of each 
primer, 2.5 p110X Klentaq1 Reaction Buffer (DNA Polymerase Technology; 
RB20),200 mM dNTPs (Bio Basic; DD0056), 2.0% (v/v) DMSO, 1.25 units 
of Klentaq-LA (DNA Polymerase Technology; 110) and water toa final vol- 
ume of 25 pl. Amplification was performed as described previously”’. In 
brief, DNA was denatured at 95 °C for 5 min, then cycled 30 times through 
denaturation at 95 °C for 1 min, hybridization at 55 °C for 1 min and exten- 
sion at 72 °C for1min. After cycling, extension was continued for 5 min at 
72 °C. PCR products (110 bp) were analysed on a2% agarose gel (Genesee 
Scientific; 20-102QD) and visualized with ethidium bromide. 


Histology and immunofluorescence staining 

Samples of mouse tissue were collected and fixed in 4% paraformalde- 
hyde (PFA; Sigma-Aldrich; P6148) overnight at 4 °C. Next, tissues were 
dehydrated in ethanol, processed and paraffin-embedded. Sections 
(5 um) of paraffin-embedded tissues from mice and humans were cut, 
deparaffinized and stained with H&E. For immunofluorescence staining, 
rehydrated tissue sections were permeated with 1x PBS supplemented 
with 0.2% v/v Triton X-100 (Thermo Fisher Scientific; BP151) for 5 min. 
Antigen retrieval was performed in antigen unmasking solution (Vector 
Laboratories; H-3300) using a Cuisinart pressure cooker for 20 min at 
high pressure. Slides were washed three times for three minutes eachin 
1x PBSsupplemented with 0.1% v/v Tween 20 (Sigma-Aldrich; P1379). Sec- 
tions were blocked with 5% (m/v) bovine serum albumin (Thermo Fisher 
Scientific; BP1600) and 5% (v/v) goat serum (Sigma-Aldrich; G9023). The 
slides were stained overnight at 4 °C with primary antibodies (Supple- 
mentary Table 4a). The following day, slides were washed as above and 
incubated for two hours at room temperature with secondary antibodies 
conjugated to fluorochromes (Supplementary Table 4a). After washing 
as above, slides were incubated with 1:4,000 DAPI (Invitrogen; D3571) 
for five minutes at room temperature, then washed as above. Slides were 
mounted with Prolong Gold Antifade Reagent (Invitrogen; P36930). 
Once stained, ten randomly selected images of the tissue at 200x mag- 
nification (that is, HPF) were obtained for each section. Blinded manual 
counting of CD3*, CD4*, CD8*, CD103* and CD45° cells was performed 
using the ZEN Blue ‘event’ tool (Zeiss). Positive cells were determined 
by comparing fluorescent intensity to the background, which was mini- 
mized using ZEN. Further analyses were performed based onthe number 
of double-positive cells (for example, CD3*CD8") and the number of 
T cell subtypes in the epidermal compartment over the total number 
of CD3° T cells in each image. 


Serology 

Using methods described previously”, anti-MmuPV1-specific antibod- 
ies inmouse serum were detected by enzyme-linked immunosorbent 
assay (ELISA). 


RNA and DNA in situ hybridization 

RNA ISH and DNA ISH were performed on formalin-fixed paraffin- 
embedded human and mouse tissue sections using RNAscope probes 
and protocols” (Advanced Cell Diagnostics; Supplementary Table 1). 
DNA probes were generated using the sense strand of viral DNA at the 
same binding sites as RNA probes. We used the HybEZ hybridization 
system to perform RNAscope assay hybridization and incubation steps. 
In brief, 5-m sections were baked in a dry oven for one hour at 60 °C 
and immediately deparaffinized in xylene, followed by rehydration 
in an ethanol series. Epitope retrieval was performed by placing the 
slides in RNAscope 1x Target Retrieval Reagent (Advanced Cell Diag- 
nostics; 322000) at 102 °C for 15 min and then washed. Protease treat- 
ment was performed by adding RNAscope Protease Plus (Advanced 
Cell Diagnostics; 322331) to the section and incubating for 30 min at 
40 °C ina HybEZ Oven II (Advanced Cell Diagnostics; 321720). After 
probe hybridization with target probes, preamplifier and amplifier, sec- 
tions were stained with Fast RED reagent (RNAscope 2.5 HD Detection 
Reagents—RED; Advanced Cell Diagnostics; 322360). A counterstain 
of 50% haematoxylin and 0.02% ammonia water was used. Positive and 
negative probes were used in each assay to ensure proper controls. 
We used probes to an endogenous housekeeping gene peptidylprolyl 
isomerase B (PPIB) (Advanced Cell Diagnostics; 313901) and the bacterial 
gene dapB (Advanced Cell Diagnostics; 310043) as positive and negative 
controls, respectively. We assessed RNA ISH and DNA ISH red signals 
under a standard bright-field microscope at 400x magnification. Ten 
representative areas of skin cancer and normal skin from each slide were 
imaged at 400x magnification and positive RNAISH or DNAISH signals 
and keratinocyte nuclei were counted in eachimageinablinded manner. 


Quantitative PCR 

RNA samples were extracted from human tissues that were stored in 
Allprotect (Qiagen; 76405) at 4 °C and flash-frozen samples stored at 
-80 °C. Apiece of tissue (approximately 50-100 mg) was washed using 
sterile 1x PBS and placed into a tube containing a 5-mm TissueLyser 
bead, then 600 pl of RNeasy Lysis Buffer (Buffer RLT; Qiagen; 79216) 
and 2-mercaptoethanol was added to the sample-bead mixture. The 
tissue was homogenized for five minutes through mechanical manipu- 
lation. The liquid was transferred into anew tube, to which1 ml TRIzol 
was added. Using standard Thermo Fisher protocols for TRIzol, the 
solution was mixed and centrifuged at 4 °C for ten minutes. The clear 
supernatant was collected and 0.2 ml chloroform was added perl ml 
TRIzol solution. The mixture was centrifuged and the clear supernatant 
retrieved. For extraction of RNA, the Allprep DNA/RNA mini kit was 
used (Qiagen; 80284). The clear supernatant was added to the Allprep 
DNA spin column and the flow through was mixed with one volume of 
70% ethanol. This solution was mixed and applied to the RNAeasy spin 
column, and standard methods of purification and DNase digestion were 
followed. RNA was quantified using a nanodrop Spectrophotometer 
(Nano Drop Technologies; ND-1000) and 1 pg RNA was used for the 
reverse-transcriptase reaction using the SuperScript III RT Kit (Thermo 
Fisher Scientific; 18080044). RNA (1 pg) was mixed with 0.25 mg mI 
random primers, 10 mM dNTP mix and nuclease free H,O for a total of 
13 pI. This sample was then incubated at 65 °C for five minutes. A mix of 
diluted 1x first-strand buffer, 0.1M dithiothreitol, 40 U pl? RNaseOUT 
and 200 U SuperScript III was added to the nucleotide mix. The sample 
was then incubated in a thermocycler. The program consisted of 5 min 
at 25 °C, one hour at 50 °C and 15 min at 70 °C. Following PCR, cDNA 
samples were diluted 1:9 using UltraPure DNase/RNase-Free Distilled 
Water. Of the 1:9 dilution, 3 pl was used in the total 10-pl quantitative 
PCR (qPCR) reaction. For forward and reverse primers (Integrated DNA 
Technologies; Supplementary Table 4b), 0.5 pl of 10 uM concentration 
was used. Five microlitres of SYBR Green master mix was used along with 
11 UltraPure DNase/RNase-Free Distilled Water per reaction for keratin 
14 and B-HPV qPCRs”. For other gene-expression analyses, a premixed 


cocktail of primers and probes was added in addition to PrimeTime Gene 
Expression Master Mix according to the manufacturer’s instructions 
(Integrated DNA Technologies). The qPCR was run on LightCycler 480 
II (Roche; 05015278001). qPCR products were verified by electropho- 
resis on a1% agarose gel at 120 V for 60 min. Analysis of relative gene 
expression was performed in triplicate for each sample by comparing 
the test genes to GAPDH as the reference gene. The average relative 
gene expressions from the normal skin samples were used to normalize 
the relative gene expressions in SCCs, warts and seborrheic keratoses. 


RNA-seq and analysis 

Total RNA was extracted from the warts of SKH-1 mice after MmuPV1 
back-skin infection and skin and tumours of SKH-1 mice after completion 
of the UV carcinogenesis protocol using the RNeasy Mini Kit (Qiagen; 
74104) according to the manufacturer’s instructions. A total of2 ig RNA 
per sample was used for RNA sample preparations. RNA integrity was 
assessed with an Agilent Bioanalyzer 2100. Libraries were prepared by 
Novogene using the NEBNext Ultra RNA Library Prep Kit for Illumina 
(New England Biolabs; E7770). Sequencing was performed by Novo- 
gene using the Illumina NovaSeq 6000 System. Reads were aligned to 
the mouse reference genome (mm10) using STAR. Differential expres- 
sion analysis was performed by Novogene using the DESeq2R package. 
Unsupervised clustering was performed and visualized as principal 
component analysis (PCA) and volcano plots. Original data are available 
in the NCBI Gene Expression Omnibus (GEO) with accession number 
GSE128476. 


Isolation of human T cells and peptide stimulation 

T cells were isolated from human skinas previously described”. In brief, 
de-identified normal facial skin samples generated as part of Mohs sur- 
gery repair were obtained. Subcutaneous fat tissue was removed from 
theskin tissue, and the remaining tissue was minced. Small fragments of 
tissue were digested in RPMI 1640 including 1% DNase-I (Sigma-Aldrich) 
and 0.2% collagenase-I (Thermo Fisher Scientific) for two hours at 37 °C. 
Then cells were passed through a40-um cell strainer and incubated in 
RPMI1640 supplemented with 20% (v/v) FBS, 1% (v/v) penicillin/strep- 
tomycin, 1% (m/v) glutamine, 0.00035% (m/v) 2-mercaptoethanol and 
50 U mI‘ human recombinant IL-2 (BioLegend). T cells from human 
skin were seeded in a 96-well plate and treated witha pool of 5 B-HPVE7 
peptides (HPV5, HPV8, HPV9, HPV20, HPV38, 5 pg ml‘ of each peptide; 
custom peptides; JPT), pool of HPV16 E7 peptides (5 pg ml! of each 
peptide; PepMix HPV 16 (protein E7); JPT; PM-HPV16-E7) or 50 ng mI 
phorbol-12-myristate-13-acetate plus 500 ng mI ‘ionomycin. Peptide 
pools were generated as 15-mers with an overlap of 11 amino acids across 
the length of E7 proteins. After 24 h of peptide exposure, cells were 
collected, stained with antibodies to surface markers for T cell activa- 
tion (Supplementary Table 4a) and examined by flow cytometry (BD 
LSRFortessa X-20). Flow cytometry data were analysed using FlowJo 
software (Ashland). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The data that support the findings of this study are available from the 
corresponding author on reasonable request. RNA-seq data have been 
deposited to the NCBI GEO (accession number GSE128476). 
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Extended Data Fig. 1| Infection of back skin with MmuPV1 in wild-type and 
Tcell-deficient mice and the effect of MmuPV1 colonization onthe 
outcomes of chemical carcinogenesis in wild-type C57BL/6J mice. a, Wart 
burdenin Cd4/Cd8“ mice (right) compared with the absence of warts in wild- 
type mice (left) after infection of back skin with MmuPVI, at 10 weeks after 
infection. Note the confluent pattern of wart developmentintheT cell-deficient 
mouse. b, MmuPV1-induced wart inaCd4’Cd8* mouse, stained withH&E (left), 
MmuPVI1L2RNAISH probe (middle) and negative-control RNAISH probe (right). 
c, Left, representative images of the back skin of wild-type C57BL/6J mice onthe 
day of MmuPV1infection and 21 days after infection. Middle, MmuPV1L1PCRon 
20 segments of the back skin. MmuPV1L1 PCR bands are marked by arrows; PCR 
amplicon size, 339 bp. PCR primers, forward: GAGCTCTTTGT TACTGTTGTC; 


reverse: ATCCTCTCTT TCCTTGGGC.M, molecular-weight size marker; N, 
negative control; P1-P3, positive controls. Right, a typical wild-type C57BL/6J 
mouse five weeks after infection, highlighting the absence of warts, which was 
the case for 100% of the mice. d, Representative macroscopic images of wild- 
type C57BL/6J mice that were either infected with MmuPV1 on their back skin or 
sham-infected, and treated with DMBA-TPA. Papillomas and invasive skin 
cancers are highlighted with yellow and red circles, respectively. e, Left, 
Representative images of the back skin of wild-type FVB mice on the day of 
MmuPV1infection and 31 days after infection, and MmuPV1L1PCRon20 
segments of the back skin. Mice were shaved for visualization of the skin and skin 
tumours. Scale bars, mouse, 1cm (a, c-e); tissue, 1mm (b). 
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Extended Data Fig. 2| See next page for caption. 
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Extended Data Fig. 2|T cells transferred from wild-type MmuPV1-colonized 
immune mice to T cell-deficient mice reduce the burden of warts in mice 
infected with MmuPVI1, but have no effect onthe growth of uninfected SCC 
cells. a, Schematic of the T cell-transfer experiment. The inset shows the gating 
strategy for flow cytometry that was used to select memory T cells. T cell-donor 
mice received CD45-APC intravenously (IV) three minutes before the collection 
of lymphnodes to label and exclude the circulating immune cells. Note that the 
control experiment in which mice were vaccinated with parvovirus was donein 
parallel with the MmuPV1 challenge, and the SCC primary tumour growth 
experiment was done in parallel with the infection of back skin with MmuPV1. 


b, Right, representative images of the warts on the back skin of mice three weeks 
after infection with MmuPVI1. Scale bar, 1cm. Left, flowcytometry demonstrates 
the presence of CD4* and CD8‘° T cells in the peripheral blood of the recipient 
mice, indicating successful adoptive transfer of T cells (n=4 per group). Wt, wild 
type.c, Growth of subcutaneously injected DMBA-TPA-induced primary SCC 
tumour cells in wild-type mice (n=9), Cd4"Cd8“ mice (n=5) and Cd4Cds- 
mice that received T cells from MmuPV1-immune donors (Cd47’ Cd8&“ +test 

T cells) (n=4). Two-tailed Mann-Whitney Utest;*P< 0.05 compared with the 
wild-type group. Data are mean+s.d. 
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Extended Data Fig. 3 | Evidence of MmuPV1 colonization and T cells homing 
tothe epidermis of MmuPV1-infected mice at the completion of the 
chemical carcinogenesis protocol. a,b, MmuPV1L1PCR on DNA isolated from 
the skin of wild-type C57BL/6) (B6) (a) and the skin and tumour of wild-type FVB 
(b) mice more than 6 months after MmuPV1infection. MmuPV1L1PCRbands are 
highlighted by arrows; PCR amplicon size: 339 bp. Plus sign indicates positive 
control; minus sign, negative control.c,d, Anti-MmuPV1seroconversion in 
DMBA-TPA-treated cohorts of C57BL/6J mice (c;n=5 per group) and FVB mice 
(d;n=4 per group). Two-tailed Mann-Whitney Utest;*P< 0.05, **P<0.01, NS, not 
significant. e, Representative images of CD3/CD45-stained skin from MmuPV1/ 
DMBA-TPA FVB mice compared with sham/DMBA-TPA controls at the 
completion of the chemical carcinogenesis protocol. Arrows indicate T cellsin 
the epidermis; dashed lines highlight the epidermal basement membrane. 

f, CD45* leukocytes quantified in skin sections of MmuPV1/DMBA-TPA and 
sham/DMBA-TPA FVB mice across ten randomly selected HPF images of normal 
skin per mouse and averaged across the mice in each group (two-tailed unpaired 
t-test; n=8 per group). Each dot represents the leukocyte count in one high- 
power image. g,h, Homing of T cells to the epidermis in MmuPV1/DMBA-TPA 
skin compared with sham/DMBA-TPA control skin of wild-type FVB mice. 

g, Representative images of CD8/CD3- and CD4/CD3-stained skin sections. 
Arrows indicate epidermal CD8* Tay cells; dashed lines highlight the epidermal 


basement membrane.h, The ratio of epidermal CD8* Tay and CD4* Tay cells to 
total CD3* T cells in the skin per HPF image (two-tailed unpaired t-test). T cells in 
up toten randomly selected HPF images of normal skin per mouse were counted. 
Each dot represents one high-power image. n=10 (MmuPV1/DMBA-TPA); n=9 
(sham/DMBA-TPA). i, Representative skin tumours from MmuPV1/DMBA-TPA 
and sham/DMBA-TPA wild-type FVB mice stained with keratin 6 (K6; a marker for 
epidermal hyperplasia) and Ki67 (a proliferation marker). Dashed lines highlight 
the epidermal basement membrane inthe skin. j, PCR amplification of the wild- 
type (A) and mutant (T) region of the Hras gene in DNA of MmuPV1/DMBA-TPA 
and sham/DMBA-TPA tumours and skin, and untreated skin froma wild-type 
FVB mouse (band size, 110 bp). The A-to-T mutation in Hras codon 61 highlights 
DMBA-TPA-induced skin tumours in MmuPV1/DMBA-TPA and sham/DMBA-T PA 
wild-type FVB cohorts. k, Matched H&E and MmuPV1L2RNAISH images of a 
wart from an MmuPVl1-infected Cd4’"Cd8* mouse, and askin tumour and 
normal skin from an MmuPV1/DMBA-TPA wild-type mouse. Note the dense and 
confluent RNAISH signals in the wart from the T cell-deficient mouse. After the 
completion of DMBA-TPA treatment, positive MmuPV1RNAISH signals are 
detectable in the normal skin of the wild-type mouse. The skin tumour fromthe 
same mouse lacks aMmuPV1RNAISH signal. Stained cells were counted blindly. 
Data are meants.d. (c,d) or mean+s.d. (f,h). Scale bars, 100 pm (e,g, i,k). 
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Extended Data Fig. 4 | Immunization of MmuPV1-infected SKH-1 mice with 
MmuPV1 vaccine protects against UV-driven carcinogenesis. a, Top, 
representative images of SKH-1 mice with no evidence of disease following 
infection (immune) and with visible warts after back-skin infection with MmuPV1 
(non-immune). Bottom, MmuPV1L2RNAISH of skin from animmune and anon- 
immune mouse, collected three weeks after infection with MmuPVI1, to detect 
viral activity inthe normal skin and the MmuPV1-driven wart. Insets highlight the 
active virus in the normal skin of the immune mouse and the wart of the non- 
immune mouse. b, Macroscopic images of the SKH-1 mice three months after 
MmuPV1back-skin infection. SKH-1 mice with spontaneous immunity to the 
virus (no wart) were treated once with an immunosuppressive dose of UVB 
(300 mJ cm”); images of the mice three weeks after UV treatment are shown. 
Arrows point to the newly developed warts on the UV-treated skin. 

c, Histological images of a wart (yellow circle), stained with H&E and MmuPV1 
RNAISH. The magnified inset highlights MmuPV1-induced cytopathic changes 
in the H&Eimage and confluent positive MmuPV1RNAISH signals in the wart. 
d, Macroscopic images of MmuPV1-infected SKH-1 mice that continued to have 
warts (yellow arrows) before MmuPV1 vaccination, four weeks after vaccination 
and at the completion of the UV carcinogenesis protocol. The nine wart-bearing 
mice were treated with MmuPVllive virus particles intraperitoneally three times 
over two weeks. Four weeks later, the mice underwent the UV carcinogenesis 
protocol. Mice with acquired antiviral immunity (n=5) are compared with non- 
immune mice that have persistent warts (n=4).e, Skin tumour burdenin 
vaccinated immune (n=5) and non-immune (n=4) mice treated with the UV 


carcinogenesis protocol. In mice with a confluent pattern of skintumours, 
counts represent the individual lesions before their coalescence. Two-tailed 
Mann-Whitney Utest; data are mean +s.d. f, Representative images of CD3/ 
CD45-stained skin from MmuPV1/DMBA-UV SKH-1 mice compared with sham/ 
DMBA-UV controls at the completion of the UV carcinogenesis protocol. Arrows 
indicate T cells inthe epidermis; dashed lines highlight the epidermal basement 
membrane. g-i, Skin-infiltrating total CD45* leukocytes (g), CD3°CD45" T cells 
(h) and CD3°CD45* leukocytes (i) quantified in CD3/CD45-stained skin sections 
of MmuPV1/DMBA-UV (n=10) and sham/DMBA-UV (n= 9) SKH-1 mice across ten 
randomly selected HPF images of eachskinsampleand averaged across the mice 
in each group. Each dot represents one high-power image. Note the trend 
towards anincreaseinT cells anda decrease in CD3° inflammatory cells in 
MmuPV1/DMBA-UV skin compared with sham/DMBA-UV control. 

j, Representative images of CD3/CD45-stained cells in the skin tumours of 
MmuPV1/DMBA-UV SKH-1 mice compared with sham/DMBA-UV controls at the 
completion of the UV carcinogenesis protocol. Magnified insets highlight the 
immune cells inthe tumour parenchyma. k-m, Tumour-infiltrating total CD45* 
leukocytes (k),CD3*CD45°T cells (1) and CD3°CD45* leukocytes (m) quantified 
in CD3/CD45-stained sections of MmuPV1/DMBA-UV and sham/DMBA-UV SKH- 
1skin tumours across HPF images of each tumour and averaged across the mice 
in each group (n=12 early skin tumours per group). Each dot represents one 
high-power image. Stained cells were counted blindly. Two-tailed unpaired 
t-test; dataare mean+s.d. (g-i, k-m). Scale bars, mouse, 1cm (a, b, d); tissue, 
100 pm (a,c, f,j). 
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Extended Data Fig. 5 | CD8* T cell immunity is required to protect MmuPVI1- 
colonized mice from UV carcinogenesis and MmuPVI1 colonization protects 
Xpc™ mice from UV carcinogenesis. a, Representative images of CD8’ T cells in 
the skin tumours of MmuPV1/DMBA-UV SKH-1 mice compared with sham/ 
DMBA-UV controls at the completion of the UV carcinogenesis protocol. 
Magnified insets highlight T cells inthe tumour parenchyma. b-d, Tumour- 
infiltrating CD3* (b), CD8* (c) and CD4* (d) T cells quantified in CD8/CD3- and 
CD4/CD3-stained tumour sections of MmuPV1/DMBA-UV and sham/DMBA-UV 
SKH-1 mice across HPF images of each tumour and averaged across the mice in 
each group (n=12 early skin tumours per group). Each dot represents one high- 
power image. e, f, CD4*T cell infiltrates in MmuPV1/DMBA-UV and sham/ 
DMBA-UV SKH-1skin. e, Representative images of the CD4/CD3-stained skin 
sections. Arrows indicate epidermal CD4* Try cells; dashed lines highlight the 
epidermal basement membrane, f, Quantification of CD4* T cells per high-power 
image of the skin. Ten randomly selected HPF images of skin per mousein each 
group areincluded. Each dot represents one high-power image. n=10 (MmuPV1/ 
DMBA-UV); n=9 (sham/DMBA-UV). Two-tailed unpaired t-test; data are 
mean+s.d. (b-d,f).g, Schematic diagram of anti-CD8 orlgG antibody treatment 
combined with the UV carcinogenesis protocol. Four weeks after MmuPV1or 
sham(VLP) infection, mice began treatment with anti-CD8 or IgG isotype control 
antibodies (red arrows). A day after the first treatment with antibodies, the back 
skin of SKH-1 mice was treated with 50 pg DMBA once (green triangle). Seven 
days later, mice began UVB treatment (100 mJ cm”) three times a week (yellow 
triangles). h, Flowcytometry analysis of spleen and skin of MmuPV1/DMBA-UV 


mice treated with anti-CD8 or IgG antibodies to evaluate the efficiency of CD8* 
Tcell depletion at six weeks after treatment with DMBA. The percentage of CD8* 
T cells is shown on each plot. i, Skin tumour burden in MmuPV1-colonized mice 
treated withIgG control (MmuPV1+1gG; n=10) or anti-CD8 antibody 

(MmuPV1+ anti-CD8; n=10), and sham(VLP)-infected mice treated withIgG 
control (sham(VLP) +1gG;n=7) or anti-CD8 antibody (sham(VLP) + anti-CD8; 
n=7) after DMBA-UV treatment. Two-tailed Mann-Whitney Utest;*P<0.05,NS, 
not significant. Data are mean +s.d.j, Representative images of mice in the four 
treatment groups. Owing to the large skin tumours in MmuPV1-colonized CD8* 
Tcell-depleted mice, the UV carcinogenesis study was terminated at 18 weeks 
after DMBA treatment. k, I, Xpc’" (XPCKO) mice were infected with MmuPV1 on 
their back skin (n=15) or sham-infected (n=13) and subjected to the UV 
carcinogenesis protocol. Skin tumour outcomes are shownas the time to 
development of the first skin tumour (k) and time to development of the first 
invasive skin cancer (I) (log-rank test). Note that allXpc” mice in the study were 
immunetoMmuPVI1(thatis, exhibited no wart development). m, Representative 
images of Xpc‘" mice at the completion of the 30-week UV carcinogenesis 
protocol. Premalignant tumours (papillomas) and invasive skin cancers are 
highlighted with yellow and red circles, respectively. Mice were shaved for UV 
treatments and the visualization of the skin tumours. n, Representative H&E- 
stained histological images of a papilloma in MmuPV1/DMBA-UV and invasive 
skin cancer insham/DMBA-UVXpc~ mice. The inset shows the cellular atypiain 
the sham/DMBA-UV skin cancer (scale bar, 50 pm). Stained cells were counted 
blindly. Scale bars, mouse, 1cm (j, m); tissue: 100 pm (a, e, n). 
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Extended Data Fig. 6 | Validation of B-HPV RNAISH using a wart as a positive 
control and qPCRon RNAISH-positive and -negative human samples. 

a, Binding site of B-HPV RNAISH and DNAISH probes, shown on the HPV9 
genome. The RNAISH and DNAISH probe against each type of B-HPV comprised 
a pool of 20 double-Z probes that target a region of 1,000 bases (Advanced 

Cell Diagnostics). b, H&E and RNA ISH staining of a wart from a 63-year-old 
immunosuppressed female. Note the abundance of positive signals (red dots) 
throughout the wart. c, Top, B-HPV RNAISH of askin cancer from an 87-year-old 
immunosuppressed female, including the stains for the positive- and negative- 


control probes. The detection of B-HPV by RNAISH correlates with qPCR 
positivity for transcripts of HPVS and HPV9 E6 proteins inthe same skin 

cancer. Bottom, B-HPV RNAISH ofasample of normal skin from an 18-year-old 
immunocompetent African American female. Thelack of RNAISH signal (red) in 
this sample correlates with undetectable transcripts of HPV5, HPV9 or HPV15E6 
proteins in qPCR of the same sample. qPCR products were visualized using gel 
electrophoresis. PCR band sizes: HPV5 E6, 100 bp; HPV9 E6, 66 bp, HPVI1SE6, 

78 bp; keratin 14, 109 bp. Scale bars, 100 pm (b,c). 
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Extended Data Fig. 7 | Immunosuppressed patients have greater B-HPV viral 
activity in their skin lesions compared toimmunocompetent patients. 

a, B-HPV RNA ISH signal counts in skin cancer cells from immunosuppressed 
(n=38) and immunocompetent (n= 32) patients. b, Clinical image of askin 
cancer surgical site showing the skin cancer (red arrow), its adjacent normal skin 
(green arrow) and the normal skin away from the cancer site (blue arrow). 

c, Quantification of B-HPV RNA ISH signals in high-power images across the 
immunosuppressed lesions, immunocompetent lesions and normal facial skin 
away fromacancer site. Skin lesions include B-HPV RNAISH signal counts from 
skin cancer (red dots) and the adjacent normal skin (green dots). Thirty samples 
of normal facial skin (blue dots) from immunocompetent patients are included 
(18 males and 12 females; average age 71; age range 39-94). d, Representative 


low- and high-magnification images of B-HPV RNA ISH-stained normal skin 
samples from immunosuppressed and immunocompetent patients. Note the 
density and size of the apparent RNAISH signals in basal-layer keratinocytes of 
animmunosuppressed patient. e, Density of B-HPV RNAISH signals in basal- 
layer keratinocytes, quantified across 38 immunosuppressed and 31 
immunocompetent skin samples. f, Right, B-HPV DNA ISH to detect B-HPV viral 
load inthe skin. Compared to B-HPV RNAISH, which marks viral transcripts, 
B-HPV DNAISH detects viral load at a subcellular resolutionin skin 
keratinocytes. Note the higher level of viral DNAISH signals compared with RNA 
ISH (left), and the localization of the signals in the nucleus and cytoplasm of the 
keratinocytes. Two-tailed unpaired t-test; data are mean +s.d. (a,c,e).Scalebars, 
50 um (d, f). 


Article 


¢ B-HPV DNAish 


Immunosuppressed 


Wart 


#00 p=0.0195 


600 


500 


400 


300 


HAK in verruca 


200 


B-HPV DNA count/100 keratinocytes 


100 


Immunosuppressed 


rw < 
ss a 
ire) se 


Jeoue9 


Immunocompetent 


700 


Squamous cell carcinoma 


uDIS 


600 


500 


400 


300 


yaouey 


200 


B-HPV DNA count/100 keratinocytes 


100 


Immunocompetent 
Squamous cell carcinoma 


Extended Data Fig. 8 | B-HPV viral load is markedly reduced in skin cancer b,c, Quantification of B- HPV DNAISH signals in paired samples of skin cancer 
cells compared to their adjacent normal skin in immunocompetent and the adjacent normal skin from immunosuppressed patients (b; n=10) and 
patients. a, Representative DNAISH ofa wart, hypertrophic actinic keratosis immunocompetent patients (c; n=10) (two-tailed Wilcoxon matched-pairs 
arising in association with a wart (HAK in verruca), SCC inimmunosuppressed signed-rank test). 

patients and an SCC inanimmunocompetent patient. Scale bars, 100 um. 
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Extended Data Fig. 9 | See next page for caption. 
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Extended Data Fig. 9 | Significantly fewer T and T,y, cells infiltrate skin 
cancer and the adjacent normal skin in immunosuppressed compared to 
immunocompetent patients. a, Representative images of CD3/CD103-stained 
SCC from immunosuppressed and immunocompetent patients (the same 
cancers are shown for B-HPVRNAISH and DNAISHstains in Fig. 3aand Extended 
Data Fig. 8a). Magnified insets highlight CD103* Ty, cells inthe cancer and 
adjacent normal skin. Scale bars, 100 pm. b,c, CD3/CD8/CD103-stained sections 
of skin cancer were used to quantify tumour-infiltrating CD3* T, CD103*CD3* Tru, 
CD8* T and CD103*CD8* Tay, cells infiltrating the skin cancer parenchyma 

(b), and CD3°* T, CD103*CD3* Tay, CD8* T and CD103*CD8* Tay cells in the 
adjacent normal skin of immunosuppressed (S) versus immunocompetent (C) 
patients. Note that most T cells in the normal skin reside in the dermis. Stained 
cells were counted blindly in ten randomly selected HPF images of skin cancer 


and adjacent normal skin from each tissue specimen and averaged across the 
samples in each group; 37 immunosuppressed and 32 immunocompetent 
samples of skin cancer are included (skin cancer characteristics are listedin 
Supplementary Table 2). Each dot represents the average of the T cell countsin 
the high-power images from each sample. Two-tailed unpaired t-test; data are 
mean‘+s.d.d, Cytotoxic degranulation of CD8* T lymphocytes after exposure to 
B-HPV peptides. T cells isolated from the normal facial skin of adults were 
exposed to B-HPVE7 peptides (far left), HPV16 E7 peptides (middle left), PMA/ 
ionomycin (positive control; middle right) and medium (negative control; far 
right). Representative flow cytometry plots are shown. The percentage of 
CD107a*CD8* T cells is shown on each plot. Data represent twoindependentsets 
of experiments with similar results. 
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Extended Data Fig. 10 | See next page for caption. 
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Extended Data Fig. 10 |DAMP molecules are upregulated during the 
development of warts and skin cancer. a, Principle component analysis (PCA) 
of gene-expression profiles obtained from MmuPV1-induced warts (n=4; blue 
triangles), MmuPV1-infected skin (n= 4; pink squares) or sham-infected skin 
(n=4; grey circles), and MmuPV1-infected tumours (n= 4; red squares) or sham- 
infected tumours (n=4; black circles) of SKH-1 mice. Note that DMBA-UV- 
induced skin tumours from MmuPV1-infected mice are indistinguishable from 
skin tumours from sham-infected mice, whereas both have very distinct 
transcriptional profiles compared with MmuPV1-driven warts. b,c, Volcano 
plots of differentially expressed genes in MmuPV1- versus sham-infected skin 
(b;n=4 per group), and skin tumours and warts (n=12) versus MmuPV1- and 
sham-infected skin (c;n=8). Gm5416 is also knownas Csta3. Pvalues were 
calculated using the DESeq2 R package (v.21.6.3), and the resulting Pvalues were 


adjusted using the Benjamini-Hochberg method for controlling the false 
discovery rate. The 20 genes that were upregulated in skin tumours and warts 
compared with MmuPVI- and sham-infected skin are shown in the table onthe 
left. d-f, Analysis of the expression of immune genes in human skin lesions on 
the basis of the mouse RNA-seq data. d, Representative macroscopic and H&E- 
stained histological images of SCC, wart, seborrheic keratosis (SK) and 
unaffected human skin. Scale bar, 500 pm. e, Relative gene expression in SCCs 
(n=7) and warts (n=5) compared with normal skin (n=8). f, Normalized relative 
gene expression in SCCs (n=7), warts (n=5) and seborrheic keratosis (n=5) 
compared for several DAMP genes. Average relative gene expression inthe 
normal skin was used for normalization. GAPDH is used as the reference gene. 
Two-tailed Mann-Whitney Utest; *P< 0.05, **P< 0.01, NS, not significant; data 
aremeants.d. (e, f). 
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Nature Research wishes to improve the reproducibility of the work that we publish. This form provides structure for consistency and transparency 
in reporting. For further information on Nature Research policies, see Authors & Referees and the Editorial Policy Checklist. 


Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


[| A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


Oo A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


Oo For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection Cytomation software (FACS), BD FACSDiva (flow), Zeiss Zen software (histological imaging), AlphaView software (PCR gel imaging), 
Applied Biosystems 7500 Real-Time PCR System software (qPCR), Illumina NovaSeq 6000 (RNA Sequencing) 
Data analysis Graphpad Prism 7 and 8, Flowjo 10, Zeiss Zen Blue, DESeq2 R package (used for RNA Seq analysis) 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


The authors declare that the data supporting the findings of this study are available within the paper [and its Source Data Files]. RNA-seq data are deposited to the 
Gene Expression Omnibus (GEO). Raw data supporting the findings of this study are available from the corresponding author upon reasonable request. 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 
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ces study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size 


Data exclusions 


Replication 


Randomization 


Blinding 


Sample size were determined based on preliminary studies conducted in our laboratories and power analysis. For experiments in which no 
preliminary data was available in our laboratories or in similarly published research, the sample size chosen were sufficient to determine 
significance in all assays, with reproducible statistical significant difference between conditions in all the experiments. For animal experiments, 
we followed the 3 R's of animal research and used only the number of animals required to reach conclusive outcomes. 


No data was excluded from the study. 


Animal studies were repeated in two independent sets of experiments with similar results. Representative data were repeated at least two 
times with similar results. 


Simple randomization scheme was applied. Mice were randomly allocated to experimental or treatment groups. Test and control mice of the 
same strain were gender and age matched. 


For tumor outcome measurements in animal studies, investigators were aware of host genotype and infection status of the mice while 
relying on unbiased measurements of quantitative parameters. The histological analyses were carried out in a blinded manner. 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 
n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


| Clinical data 


Antibodies 


Animals and other organisms 


Human research participants 


Antibodies used 


mmunostaining Primary Antibodies, Clone, Company: 
Rat Anti-Mouse/Human, CD3, ab11089, CD3-12, Abcam, Cambridge, MA 
Rabbit Anti-Mouse, CD4, ab183685, EPR19514, Abcam 
Rabbit Anti-Mouse, CD8, 989415, D4B9C, Cell Signaling Technology, Danvers, MA 
Rabbit Anti-Mouse, Ki67, ab15580, Polyclonal, Abcam 
ouse Anti-Mouse, Keratin 6, ab18586, Ks6.KA12, Abcam 
Rabbit Anti-Mouse, CD45, ab10558, Polyclonal, Abcam 
ouse Anti-Human, CD8a, 703065, C8/144B Cell Signaling Technology 
Rabbit Anti-Human, CD103, ab129202, EPR4166(2), Abcam 


Secondary Antibodies and Staining Kits, Company: 

Goat anti-Rabbit IgG, Alexa Fluor® 488 conjugate, A-11034, polyclonal, Thermo Fisher Scientific, Waltham, MA 
Goat anti-Rabbit IgG, Alexa Fluor®568 conjugate, A-11036, polyclonal, Thermo Fisher Scientific 

Goat anti-Mouse IgG, Alexa Fluor® 568 conjugate, P-852, polyclonal, Thermo Fisher Scientific 

Goat anti-Mouse IgG, Alexa Fluor® 647 conjugate, A-21235, polyclonal, Thermo Fisher Scientific 

Goat anti-Rat IgG, Alexa Fluor® 488 conjugate, A-11006, polyclonal, Thermo Fisher Scientific 


Mouse Flow Cytometry Antibodies, Clone, Company: 
CD3e-PE-Cy7, 100320, 145-2C11, BioLegend, San Diego, CA 
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D4-APC-Cy7, 100526, RM4-5, BioLegend 
D8a-FITC, 100706, 53-6.7, BioLegend 

D45-APC, 103112, 30-F11, BioLegend 
D62L-PerCP/Cy5.5, 104432, MEL-14, BioLegend 


enenene) 


uman Flow Cytometry Antibodies, Clone, Company: 
D3-FITC, 555916, UCHT1, BD Biosciences, San Jose, CA 
D4-APC eflour780, 47-0049-42, RPA-T4, eBioscience 
D8-PerCP/Cy5.5, 344709, SK1, BioLegend 

D45-APC, 304012, HI30, BioLegend 

D69-BV421, 310930, FN50, BioLegend 

D107a-BV605, 328633, H4A3, BioLegend 
D137-BUV395, 745737, 4B4-1, BD Biosciences 
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Depleting Antibodies 
Anti-Mouse CD8a, BPO117, YTS 169.4 BioXCell, West Lebanon, NH 
Rat Isotype Control, i4313, Polyclonal, Sigma-Aldrich, St. Louis, MO 
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Validation All antibodies are commercially available and validated by previous studies done by multiple laboratories including ours. The 
complete descriptions and information about each antibody is provided in the corresponding Data Sheets available on the 
Manufacturers’ website. 


Animals and other organisms 


Policy information about studies involving animals; ARRIVE guidelines recommended for reporting animal research 


Laboratory animals Mus musculus, C57BL/6J, female, age 6-10 weeks. 
Mus musculus, FVB, Wt and CD4-/-, CD8-/- female, age 6-10 weeks. 
Mus musculus, SKH-1, female, age 6-10 weeks. 
Mus musculus, XPC-/-, female and male, age 6-10 weeks. 
Mus musculus, B6.Cg-Foxninu/Foxninu female and male, age 6-10 weeks. 
All animals except CD4-/-, CD8-/- (provided by Dr. David DeNardo) were purchased from Jackson Laboratory or Charles River 
Laboratories. 


Wild animals The study did not involve wild animals. 
Field-collected samples The study did not involve field-collected samples 
Ethics oversight Massachusetts General Hospital and University of Louisville |ACUC approved the animal studies. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Human research participants 


Policy information about studies involving human research participants 


Population characteristics De-identified archived skin cancers samples obtained from Massachusetts General Hospital Pathology Department. Discarded 
de-identified normal human skin samples were obtained through Mohs surgery clinics at Massachusetts General Hospital. 


Recruitment ormal skin and discarded tissues were collected as part of a discarded de-identified tissue sample use protocol. 


Ethics oversight assachusetts General Hospital IRB approved the de-identified human tissue analysis. 


Note that full information on the approval of the study protocol must also be provided in the manuscript. 


Flow Cytometry 


Plots 


Confirm that: 


The axis labels state the marker and fluorochrome used (e.g. CD4-FITC). 


The axis scales are clearly visible. Include numbers along axes only for bottom left plot of group (a 'group' is an analysis of identical markers). 


All plots are contour plots with outliers or pseudocolor plots. 
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Methodology 


Sample preparation In experiment depicted in Fig. 1e, j and Extended Data Fig. 2, lymph nodes of MmuPV1 infected and Parvovirus vaccinated wild- 


Sample preparation 


Instrument 


Software 


Cell population abundance 


Gating strategy 


type FVB mice were harvested and sorted using flow cytometry. To stain peripherally circulating lymphocytes, mice were 
injected intravenously with 2 ug CD45 APC three minutes prior to euthanasia. Single cell suspensions of donor lymph nodes were 
combined and stained for FACS. Cells were stained for 30 minutes on ice with the following surface markers diluted with PBA 
(500 mL PBS, 25 mL Newborn Calf Serum, 1 mL 10% sodium azide): CD3e PE-Cy7 , CD4 APC-Cy7 , CD8a FITC, CD45 APC, and 
CD62L PerCp/Cy5.5. Cells were sorted using MoFlo XDP, Cell Sorter. 


To examine the presence/absence of circulating T cells in recipient mice (Extended Data Fig. 2b), peripheral blood was collected 
from T cell recipient mice 2 months following the T cell transfer. 2-3 drops of blood per mouse via submandibular vein was 
collected in 10 mL of RBC Lysis Buffer (Biolegend, 420301), spun at 300 g for 5 min at 4C, and resuspended in a cocktail with the 
following antibodies in PBA: CD3e PE-Cy7 (Biolegend, 100320), CD4 APC-Cy7 (Biolegend, 100414), and CD8a FITC (Biolegend, 
100706). 


n CD8+ T cell depletion study, a subset of MmuPV1/DMBA-UV mice that were treated with anti-CD8 or IgG isotype control were 
harvested 6 weeks after the start of antibody treatment. Spleen and skin single cells were prepared and stained with the 
following antibodies in PBA: CD45 APC, CD3e PE-Cy7 (Biolegend, 100320), CD4 APC-Cy7 (Biolegend, 100414), and CD8a FITC 
(Biolegend, 100706). 


Human skin cells were collected, used in ex vivo peptide assay and stained with antibodies to surface markers for T cell activation 
(Supplementary Table 5a), and examined by flow cytometry (BD LSRFortessa X-20). 


Flow data were analyzed using FlowJo software (Ashland, OR). 


oFlo XDP, Cell Sorter (Beckman Coulter) using Cytomation software (for mouse studies) and BD LSRFortessa X-20 (for human 
flow studies) 


FlowJo 10 


nT cell transfer experiment depicted in Fig. 1e, j and Extended Data Fig. 2, T cells from MmuPV1-immune mice (CD4: 2.53 
million cells; CD8: 408,000 cells) and Parvovirus-vaccinated mice (CD4: 826,000 cells; CD8: 143,000 cells ) were sorted. CD4 and 
CD8 cells were combined at a 6:1 ratio and 129,600 cells were transferred into each recipient mouse. 


For T cell transfer experiments depicted in Fig. 1e, | and Extended Data Fig. 2, memory CD4+ T cells were sorted using 
ymphocyte gate in FSC/SSC plot, singlets, CD3e PE-Cy7+, CD4 APC-Cy7+, CD62L PerCp/Cy5.5-, and CD45 APC- cells. Memory CD8 
+ T cells were sorted using CD3e PE-Cy7+, CD8 FITC+, CD62L PerCp/Cy5.5-, and CD45 APC-. Gating strategy is depicted in 
Extended Data Fig. 2a. 
T cell analyses of blood was gated on lymphocytes in FSC/SSC plot, singlets, CD3e PE-Cy7+, CD8 FITC+ vs. CD4 APC-Cy7+ 

T cell analyses of spleen and skin were gated on lymphocytes in FSC/SSC plot, singlets, CD45 APC, CD3e PE-Cy7+, CD8 FITC+ vs. 
CD4 APC-Cy7+ 
For the analysis of T cells isolated from human skin and used in ex vivo peptide assays, cells were gated on on lymphocytes in 
FSC/SSC plot, singlets, CD45 APC+, CD3e FITC+, CD4 APC eflour780+ vs. CD8 PerCP/Cy5.5+, CD69 BV421+, CD137 BUV395+, 
CD107a BV605+ 


4 


Tick this box to confirm that a figure exemplifying the gating strategy is provided in the Supplementary Information. 
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CDK phosphorylation of TRF2 controls 
t-loop dynamics during the cell cycle 
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The protection of telomere ends by the shelterin complex prevents DNA damage 
signalling and promiscuous repair at chromosome ends. Evidence suggests that the 3’ 
single-stranded telomere end can assemble into a lasso-like t-loop configuration’”, 
which has been proposed to safeguard chromosome ends from being recognized as 
DNA double-strand breaks”. Mechanisms must also exist to transiently disassemble 
t-loops to allow accurate telomere replication and to permit telomerase access to the 
3’ end to solve the end-replication problem. However, the regulation and 
physiological importance of t-loops in the protection of telomere ends remains 
unknown. Here we identify a CDK phosphorylation site in the shelterin subunit at 
Ser365 of TRF2, whose dephosphorylation in S phase by the PP6R3 phosphatase 
provides a narrow window during which the RTEL1 helicase can transiently access and 
unwind t-loops to facilitate telomere replication. Re-phosphorylation of TRF2 at 
Ser365 outside of S phase is required to release RTELI from telomeres, which not only 
protects t-loops from promiscuous unwinding and inappropriate activation of ATM, 


but also counteracts replication conflicts at DNA secondary structures that arise 
within telomeres and across the genome. Hence, a phospho-switch in TRF2 
coordinates the assembly and disassembly of t-loops during the cell cycle, which 
protects telomeres from replication stress and an unscheduled DNA damage 


response. 


Telomere homeostasis is crucially dependent on the function of the 
shelterin complex but how this is regulated during the cell cycle remains 
uncertain. Using phospho-proteomic analysis of the shelterin complex, we 
identified a putative CDK2 phosphorylation site in human TRF2 at Ser365 
(Ser367 in mouse; Extended Data Fig. 1a), whichis abolished by treatment 
with A-protein phosphatase (Fig. 1a, left and middle) or mutation of the 
phospho-site to alanine (Myc-tagged TRF2(S367A); Fig. la, right). Analysis 
of thecell cycle revealed that this modification is abundant in G1,G2andM 
phases but is markedly reduced inS phase (Fig. 1b, Extended Data Fig. 1b). 

Deletion of 7rf2 (also known as Terf2) results in telomere deprotec- 
tion and chromosome end-to-end fusions? (Fig. Ic, right). By contrast, 
Trf2’ mouse embryonic fibroblasts (MEFs) complemented with wild- 
type TRF2 or phospho-dead (Myc-TRF2(S367A)) or phospho-mimetic 
(Myc-TRF2(S367D) and Myc-TRF2(S367E)) mutants lacked telomere 
fusions (Fig. Ic, left, Extended Data Fig. 1c, d). The TRF2 Ser367 mutants 
also retained interactions with other shelterin proteins, including TRF1 
and RAPI, and depletion of RAP1 did not result in telomere fusions in 
cells expressing Myc-TRF2(S367A)* (Extended Data Fig. 2a—c). Hence, 
TRF2 Ser367 mutants retain the ability to engage with other shelterin 
components and to protect telomeres against fusions. 

Further analysis of TRF2-null cells expressing the TRF2 Ser367 mutants 
showed that the phospho-dead Ser367Ala mutant (Myc-TRF2(S367A)) 


resulted in high levels of telomere fragility, which indicates problems 
intelomere replication’, whereas the phospho-mimetic mutants (Myc- 
TRF2(S367D/E)) resulted in frequent loss of telomeres, signal-free ends 
and high levels of extra-chromosomal telomere circles® (Fig. 1d-g). 
Because the distinct phenotypes of the phospho-dead and phospho- 
mimetic TRF2 Ser367 mutants resemble cells that fail to recruit the heli- 
case RTEL1 to replication forks and telomeres, respectively’, we reasoned 
that Ser365 or Ser367 of TRF2 might serve as a phospho-dependent 
TRF2-RTEL1 protein-interaction surface, which could cooperate with 
the TRFH domain that was previously shown to interact with RTEL1®. 
Pull-down experiments using biotinylated human TRF2 peptides encom- 
passing amino acids 354-383 revealed a prominent RTEL1 band with 
the unphosphorylated peptide (S365) but not with the phosphorylated 
peptide (pS365) or an unrelated TRF2 control peptide (384-413) (Fig. 2a, 
b, Extended Data Fig. 3a). These results raised the possibility that the 
phosphorylation of Ser365 or Ser367 of TRF2 negatively regulates the 
interaction between TRF2 and RTELI. The addition of A-protein phos- 
phatase was found to enhance this association in cell extracts (Fig. 2c), 
whereas addition of the phosphatase inhibitor PhosSTOP prevented a 
robust TRF2-RTEL1 interaction (Fig. 2c). Treatment of cells with the 
CDK inhibitor R-roscovitine, but not with a PLK1 inhibitor (BI-2536), 
also enhanced levels of Myc-TRF2co-immunoprecipitated with RTEL1 


'The Francis Crick Institute, London, UK. ?Genome Integrity Unit, Children’s Medical Research Institute, University of Sydney, Westmead, New South Wales, Australia. “School of Medicine, The 
University of Notre Dame Australia, Sydney, New South Wales, Australia. “Dana-Farber Cancer Institute, Harvard Institute of Medicine, Boston, MA, USA. °These authors contributed equally: 
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Fig. 1| Mutations in TRF2 at Ser365 or Ser367 result in dysfunctional 
telomeres. a, Whole-cell extracts from HEK 293 cells (left) or HEK 293 cells 
expressing mouse Myc-tagged TRF2 (Myc-TRF2) or the indicated mutants 
(right) were pre-treated with A-protein phosphatase (APP) and subject to 
western blotting. Ctrl, control; IP, immunoprecipitate; S/A, phospho-dead 
mutant Myc-TRF2(Ser367Ala); WT, wild type. b, IgG control and endogenous 
phospho-TRF2immunoprecipitates from HEK 293 cells at the indicated stages 
of the cell cycle. c-f, Quantification of telomere fusions (c), telomere fragility 
(d, e) and telomere loss (f) per metaphase from 7rf2” MEFs stably expressing 
the indicated 7rf2 genotypes (n= 35 analysed metaphases). S/D and S/E, 
phospho-mimetic Myc-TRF2 mutants Ser367Asp and Ser367Glu, respectively. 


(Fig. 2d). The interaction of RTEL1 with wild-type Myc-TRF2, but not 
with the Myc-TRF2(S365A) mutant, was inhibited after incubation with 
recombinant cyclinA-CDkK2 (Fig. 2e), which supports previous findings 
that Ser365 of TRF2 is a cyclinA-CDK substrate’. Inhibition of ERK1/2 
also had no effect onthe phosphorylation of TRF2(Ser365)”° (Extended 
Data Fig. 3b). Whereas both phospho-mimetic mutants abolished the 
TRF2-RTEL1interaction in cells, the phospho-dead Myc-TRF2(S367A) 
mutant interacted to a much greater extent with RTEL1 compared with 
wild-type Myc-TRF2 (Fig. 2f, g). Hence, TRF2 phospho-mimetic mutants 
abrogate the TRF2-RTEL1 interaction, which results in telomere loss and 
increased telomere circles, whereas the phospho-dead TRF2(S367A) 
mutant enhances the TRF2-RTEL1 interaction and results in telomere 
fragility. We conclude that CDK phosphorylation of TRF2 at Ser365 or 
Ser367 inhibits its interaction with RTEL1. 

Given that phosphorylation of TRF2 at Ser365 is markedly reduced 
in S phase when TRF2 recruits RTEL1 to telomeres’, we considered 
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Data are mean +s.e.m. Representative images of telomere FISH experiments 
are shownincande. Asterisks indicate telomere fragility, arrowheads denote 
loss of telomere signal. Red, telomere peptide nucleic acid (PNA) FISH; blue, 
DAPI. g, Phi29-dependent telomere circles (TCs; top) and quantification 
(bottom) in DNA isolated from 7rf2“ MEFs stably expressing empty vector 
control, wild-type or mutant TRF2, 96 hafter infection with control or Cre- 
expressing adenovirus (Ad-GFP or Ad-Cre-GFP, respectively). Data are 

mean +s.d. from three independent experiments. In a-f, experiments were 
independently repeated at least twice with similar results. All Pvalues 
determined by one-way analysis of variance (ANOVA). 


the possibility that TRF2 Ser365 is actively de-phosphorylated. TRF2 
and RTEL1 complexes purified from S phase and asynchronous cells 
contained a number of phosphatases and/or regulatory subunits that 
showed increased association with TRF2 and/or RTEL1 in S phase, 
including UBLCP1, PP1R10, PP4R1, PP4R2, PP6R2, PP2RSC and PP6R3 
(Extended Data Fig. 4a). Of these phosphatases, knockdown by short 
interfering RNA (siRNA) of PP4R2 or PP6R3 or their respective catalytic 
subunits (PP4C or PP6C, respectively), greatly reduced the TRF2-RTEL1 
interaction in two different cell lines (Extended Data Figs. 4b, c, 5a, b). 
Co-immunoprecipitation studies confirmed that PP4R2 and PP6R3 
regulatory subunits interact with TRF2 and RTEL1I in vivo (Extended 
Data Fig. 4d). Notably, phosphorylation of human TRF2 at Ser365 and 
mouse TRF2 at Ser367 were greatly enhanced after silencing of PP6R3 
but not in cells subjected to PP4R2 knockdown (Extended Data Fig. 5c, 
d). Cells depleted of the PP4R2 or PP6R3 regulatory subunits also exhib- 
ited telomere loss (Extended Data Fig. 5e) and a greater than threefold 
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Fig. 2| The Ser365 and Ser367 phospho-site in TRF2 controls TRF2-RTEL1 
and RTEL1-PCNA interactions. a, Domain organization of mammalian TRF2 
protein. TBM, TIN2-binding motif; RBM, RAP1-binding motif. b, Western blots 
of peptide pull-downs from HEK 293 cells expressing pHAGE-HA-Flag-RTEL1 
(WT) or empty vector (Ctrl). c, Western blot of input and RTEL1 
immunoprecipitates from Myc-TRF2 samples treated with vehicle control, 
A-protein phosphatase or phosphatase inhibitor (APP + stop). d, Western blot of 
input and RTEL1immunoprecipitates from extracts of HEK 293 cells expressing 
Myc-TRF2 and pre-treated with vehicle, PLK1 inhibitor (PLK1i) or CDK inhibitor 
R-roscovitine (Rosc.) for 24 h.e, Immunoprecipitates from HEK 293 cells were 


induction in telomere circles when compared with controls (Extended 
Data Fig. 5f). These data indicate that PP6R3 dephosphorylates TRF2 
Ser365 or Ser367 to permit the transient recruitment of RTEL1 to tel- 
omeres inS phase. 

Because RTELI facilitates global and telomere replication through 
its ability to interact with PCNA”, we considered the possibility 
that the phospho-dead TRF2(S367A) mutant might sequester RTEL1 
and limit its ability to bind PCNA. Indeed, cells expressing the 
phospho-dead TRF2(S367A) mutant, but not wild-type TRF2 or the 
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subjected to anin vitroimmunoprecipitation kinase assay with ATP and 
purified cyclinA-CDK2 complex (CycA-CDK2), resolved by SDS-PAGE and 
blotted using anti-Myc (TRF2) and anti-RTEL1 antibodies. Input (5%) is shown 
onthe right. f,h, Immunoprecipitates were resolved by SDS-PAGE and 
analysed by western blotting for co-precipitated Myc-TRF2 (f) or PCNA (h). 
Input (5%) is shown. g, i, Cells as infand h were quantified by in situ PLA assay 
for the interaction between Myc-TRF2 and RTEL1(g) or PCNA andRTEL1 (i). 
Data are mean+s.e.m.;n=50 nuclei analysed. In b-i, experiments were 
independently repeated at least twice with similar results. Pvalues determined 
by one-way ANOVA. 


phospho-mimetic mutants, were compromised for the RTELI-PCNA 
interaction in co-immunoprecipitation and proximity ligation assay 
(PLA) experiments (Fig. 2h, i). Furthermore, analysis of global replication 
dynamics revealed that TRF2-null cells expressing the phospho-dead 
TRF2(S367A) mutant, but not wild-type TRF2 or the phospho-mimetic 
mutants, exhibited reduced replication fork extension rates and 
increased asymmetric forks across the genome (Fig. 3a, Extended Data 
Fig. 6a, b). Cells expressing the Myc-TRF2(S367A) phospho-dead mutant 
also exhibited increased levels of replication stress, which manifested 
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Fig. 3 | Disruption of the interaction between RTEL1 and the phospho-dead 
TRF2 mutant rescues abnormal genome-wide replication phenotypes. 

a, Quantification (left) and representative scatter plots (right) of fork 
asymmetry (n denotes number of analysed forks). Data are mean+s.e.m. 

of triplicate experiments. b, Western blots of input and RTEL1 
immunoprecipitates from cells of the indicated genotype. R/H, V5- 
RTEL1(R1237H); S/A, Flag-TRF2(S367A).c, Left, quantification of the PCNA- 
RTEL1interaction as determined by in situ PLA assay (n= 70 nuclei analysed). 
Data are mean +s.e.m. Right, representative images of the telomere FISH 
experiments. d, Left and middle, quantification of telomere fragility (left) and 
telomere loss (middle) per metaphase determined fromthe same cells asinb 


as micronuclei, mitotic catastrophe and increased 53BP1 nuclear foci 
(Extended Data Fig. 6c-f). 

We reasoned that ifthe phospho-dead TRF2(S367A) mutant sequesters 
RTEL1 at telomeres, then expressing RTEL1 with a mutation in the C4C4 
motif that is defective for TRF2 binding’ (RTEL1(R1237H)) should miti- 
gate this effect. Indeed, co-expression of the phospho-dead Flag-tagged 
TRF2(S367A) with the V5-tagged RTEL1(R1237H) mutant, but not with 
wild-type V5-RTELI, restored the interaction between RTEL1and PCNA in 
mouse ear fibroblasts (Fig. 3b, c). This co-expression also suppressed the 
levels of fragile telomeres (Fig. 3d), rescued the DNA replication defects 
(Fig. 3e, f, Extended Data Fig. 7a, b), and suppressed the formation of 
micronuclei, mitotic catastrophe and 53BP1 fociin mouse ear fibroblasts 
expressing the phospho-dead Flag-TRF2(S367A) mutant (Extended Data 
Fig. 7c-f). These data suggest that the TRF2(S367A) mutant sequesters 
the endogenous pool of RTELI, potentially at both telomeres and pericen- 
tromeric regions”, which restricts its ability to engage with PCNA leading 
to replication stress at telomeres and across the genome. 

RTEL1 has been shown to unwind D-loops based on genetic studies 
and its ability to resolve such structures in vitro’. However, evidence 
demonstrating a direct role in unwinding t-loopsin vivo, which contain 
aD-loop at the point of strand invasion, remains lacking. Because the 
TRF2(S367A) phospho-dead mutant sequesters RTEL1at telomeres, we 
asked what would happen tot-loops in this context. Visualization of tel- 
omere secondary structures by Airyscan super-resolution microscopy 
revealed no measurable reduction in t-loop abundance in TRF2-null 
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(n= 62 metaphases). Data are mean +s.e.m. Right, representative images of the 
telomere FISH experiments. Asterisks denote telomere fragility; arrowheads 
indicate telomere loss. e, Quantification of global replication fork dynamics 

(n denotes the number of analysed forks). Dataare mean +s.e.m. of triplicate 
experiments. In box plots, horizontal line denotes the median; whiskers denote 
the 10th and 90th percentiles. f, Quantification (left) and representative 
scatter plots (right) of fork asymmetry (n denotes the number of analysed 
forks). Data are mean +s.e.m. from three experiments. In b-d, the experiment 
was independently repeated at least twice with similar results. Pvalues 
determined by one-way ANOVA (a, e, f) or unpaired two-tailed 

Student’s t-test (c,d). 


cells expressing wild-type TRF2® (Fig. 4a, Extended Data Fig. 8a). 
However, the frequency of t-loops was significantly diminished in 
TRF2-null cells expressing the Myc-TRF(S367A) phospho-dead mutant 
(Fig. 4b, c). Hence, sequestration of RTEL1 at telomeres leads to pro- 
miscuous t-loop unwinding, decreasing the overall levels of t-loops. 
The spurious t-loop unwinding observed in the TRF2(S367A) mutant 
presented an opportunity to directly test whether t-loops are important 
for suppressing the DNA damage response (DDR) at telomeres. Analy- 
sis of TRF2-null cells expressing the phospho-dead Myc-TRF2(S367A) 
mutant revealed a largely ATM-dependent DDR induction at telomeres, 
albeit with a modest accumulation of DNA damage-induced RPA foci 
and activation of ATR due totelomere fragility (Fig. 4d—g, Extended Data 
Fig. 8b-d). Measuring the lengths of telomere contours in super-resolution 
micrographs revealed that linear telomeres from cells expressing Myc- 
TRF2(S367A) overlapped in length distribution with looped telomeres 
from the wild-type Myc-TRF2 control with protected chromosome ends 
(Extended Data Fig. 8e, f). These data suggest that promiscuous t-loop 
unwinding results in linear telomeres that activate an ATM-dependent 
DDR (Fig. 1c). Collectively, these data reveal that the t-loop structure is 
important for suppressing the activation of ATM at telomere ends. 
Inconclusion, our study identifies a phospho-switch in TRF2 that reg- 
ulates the transient recruitment and release of RTEL1 from telomeres, 
which is required to temporarily disassemble t-loops during S phase 
to avert telomere catastrophe’”*™, while also preventing promiscuous 
t-loop unwinding during other cell cycle stages. We suggest that such 


a f ATRI/ATMi 
TRF2 
(short) ee ee 
re | = =a = o & 
(long) | Bam as g a 
Vinuin —— <a z 
4-OHT: - + - +--+ 
Ctrl a S/A 
Trt2F¥ ;Rosa26creERT2 ie 
b P = 0.9886 3 7d 
B 
405 P=0.3010 oy 
ca = 
g + P = 0.0027 = R 
8 3042 o a 
3° Cas 5 = 
= £ E 
3 3 
£ 204 A are 
3 * g 8 
& § 3 
“10-4 
Qa 
fey 
8 
“cour: Soh Siok Trf2FF:Rtel1"" + Ad-Cre-GFP 
wit oh IF IF ‘CreERT2 
Tf posazeOeeRT? Trf2""F;Rosa26 + TRF2 S/A 9 eee 
d P< 0.0001 e 
4 7 is 50+ 
= 0 . No 4-OHT 120 h4-OHT IP = 0.0008" 
3 7 _ m1 
8 ° g = 404 « 
£ 5 = 
g 405 ty a 3 ode oe 
£ ° 2 ~ 304 eee 
oO i oy wt ose oe 
E ec a . 
& 204 3 x f 2048 + 
i & a = ate 
= tL t 10 Ss 
"gene : 1 # 
0 E a 
4-OHT:- + - + 4 o 1] — 
eta : Re ce 
Wr __S/A CLS ey 
Trf2F/F-Rosa26creeRT2 Trf2FF-Rosa26creERT2 ° 2 s 


Fig. 4 | Expression of the TRF2(S367A) mutant promotes telomere- 
dysfunction-induced foci and impairs the formation of t-loops. a, Western 
blotting analysis of cells of the indicated genotype. Short and long indicate 
short and long exposure times. b, Quantification of t-loops observed in Trf2” 
MEFs expressing wild-type Myc-TRF2 or the Ser367Ala mutant with or without 
treatment with 4-hydroxytamoxifen (4-OHT) for 120 h. Data are exclusive of 
ambiguous molecules (n= 3 biological replicates scoring =1,192 molecules per 
replicate). Data are mean+s.e.m.c, Representative images of t-loops and linear 
telomeres identified by Airyscan super-resolution imaging. Scale bar, 11m. 

d, Quantification of telomere-dysfunction-induced foci (TIF) per metaphase 


exquisite control of TRF2 to regulate t-loop opening and the need to 
‘protect’ t-loops from promiscuous unwinding by RTEL1 outside of 
S phase, further demonstrate that t-loops are essential for physiological 
telomere homeostasis and chromosome end protection. 
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Methods 


Cell culture procedures 

SV40-LT-immortalized Rtell'” MEFs’, Trf2'” MEFs (a gift from T. de 
Lange)’, Trf2‘";Rosa26“*” MEFs a gift from E. Lazzerini-Denchi)» 
and Trf2‘";Rtell mouse ear fibroblasts were cultured in DMEM 
supplemented with 15% fetal bovine serum (FBS; Invitrogen), L-glutamine 
and penicillin-streptomycin. HEK 293 and Phoenix Ampho 293 cells 
were kept in DMEM with 10% FBS. 7rf2‘";Rtell” conditional double- 
knockout mouse ear fibroblasts were isolated from adult mice obtained 
by crossing of the individual targeted 7rf2‘” and Rtell’” mice. Genotypes 
were determined by Transnetyx using quantitative PCR with allele- 
specific probes. Production of retroviral supernatants and transduc- 
tions of Rtel1'” and Trf2 MEFs were done essentially as previously 
described’. Trf2” MEFs wereinfected with pLPC-puromycin retroviruses 
expressing control vector, Myc-tagged wild-type TRF2 or S367 mutants. 
Trf2'";Rosa26"®” MEFs were infected with pLPC-puromycin retrovi- 
ruses expressing Myc-tagged wild-type TRF2 or mutant TRF2(S367A). 
Human HEK 293 cells were transduced with pLPC-puromycin 
retroviruses carrying Myc-tagged wild-type or mutant (S365A) TRF2. 
Trf2":Rtell mouse ear fibroblasts were complemented with pLPC- 
hygromycin retroviruses carrying Flag-tagged TRF2(S367A) and pBABE- 
puromycin retroviruses expressing mouse V5-tagged wild-type RTEL1 
or RTEL1(R1237H) mutant. Transduced cells were selected with puro- 
mycin (2 pg mI) for 2-6 days. Trf2";Rtel1” mouse ear fibroblasts 
were kept under puromycin (2 pg ml”) and hygromycin (150 pg ml) 
selection for 5 days. Deletion of floxedalleles in Rtell’ and Trf2” MEFs 
was carried out with Ad-Cre-GFP adenovirus (Vector Biolabs) and cells 
were genotyped by PCRat 96 hafter infection as previously described*’. 
TRF2was deleted in Trf2"";Rosa26“*” MEFs by adding 1 1M 4-hydroxy- 
tamoxifen (Sigma-Aldrich) to the culture medium. Cell lines were 
routinely tested for mycoplasma contamination with negative results. 


Cell lysis, western blotting, immunoprecipitation and drug 
treatments 

Cells were rinsed twice with PBS, transferred to an ice-cold NET lysis 
buffer (SO mM Tris (pH 7.2), 150 mM NaCl, 0.5% NP-40, 1x EDTA-free 
Complete protease inhibitor cocktail (Roche), 1x PhosSTOP phos- 
phatase inhibitor cocktail (Roche)) and lysed for 10 min onice. The cell 
lysates were then briefly vortexed and passed through a 23G syringe five 
times. The soluble protein fractions were collected after centrifugation 
at 16,000g for 10 min at 4 °C. Western blotting analysis was performed 
as previously described’. Immunoblots of whole-cell extracts from 
Trf2"-Rosa26“®” cells with or without exogenous expression of the 
Myc-TRF2allele was performedas previously described”. For protein 
immunoprecipitation, whole-cell extracts were precleared with protein 
G Sepharose (Sigma-Aldrich) and 1-2 mg of precleared extract was 
incubated with the indicated antibodies. Immunocomplexes were 
subjected to SDS-PAGE followed by immunoblotting using nitrocellu- 
lose membrane (GE Healthcare). See Supplementary Table 1 for alist of 
antibodies used. For inhibition of CDKs, R-roscovitine (Sigma-Aldrich), 
was used at a final concentration of 10 pM for 24h. PLK-1 was inhibited 
by BI-2536 (Axon Medchem) ata final concentration of 100 nM for 24h. 
The MEK-ERK signalling pathway was inhibited by the MEK1 and MEK2 
inhibitor U0126 (Selleckchem), at a concentration of 30 uM for 24 h. 
An equal amount of DMSO was used as a vehicle control. 


siRNA treatment and siRNA oligonucleotides 

Transfections with siRNA oligonucleotides were performed using the 
Lipofectamin RNAiMax (Thermo Fisher Scientific). In brief, human cells 
at density of 2.0 x 10° cells per well were transfected in a 6-well plate 
with 40 pmol siRNA. Mouse cells at density of 3.0 x 10° cells per well 
were transfected with 150 pmol siRNA. The medium was exchanged 24h 
after transfection. Then, 72h after transfection, the cells were collected 
and the levels of proteins of interest were assessed by immunoblot 


analyses as described. For silencing experiments in human and mouse 
cells, pre-designed SMARTpool ON-TARGETplus and Accell siRNA oli- 
gonucleotides (Dharmacon; GE Healthcare) were used, respectively. 
For siRNA oligonucleotides details, see Supplementary Table 2. 


A-Phosphatase treatment 

Whole-cell extracts were prepared as described above except in the 
absence of phosphatase inhibitors. Lysates were incubated with 800 U 
of A-phosphatase (New England Biolabs) in NET lysis buffer supple- 
mented with 1 mM of MnCl, along with protease inhibitors for 30 min at 
30 °C. Next, the lysates were incubated on ice for 15 min and subjected 
to immunoprecipitation as detailed in the main text. 


In vitro kinase assay 

Whole-cell extracts from HEK 293 cells were incubated for 1h at 4 °C 
with the rabbit polyclonal anti-RTEL1 antibody. Immunocomplexes 
were coupled to protein G Sepharose beads for an additional 1 hat 4 °C 
and washed three times with the NET lysis buffer followed by two washes 
with kinase buffer (20 mM Tris, pH 7.5,50 mM KCI, 7.5 mM MgCl, 10 mM 
MnCl,, 1mM DTT, 1x PhosSTOP phosphatase inhibitors). Kinase reac- 
tions were performed by incubating the immunocomplexes with 20 
pl of kinase buffer containing cold adenosine triphosphate (ATP) and 
1pg recombinant CDK2-cyclinA protein complex for 20 min at 37 °C. 
Reactions were washed twice with kinase buffer and terminated by the 
addition of 5x SDS-PAGE sample buffer, and resolved by SDS-PAGE. 


Generation of pTRF2(Ser365/367) phospho-specific antibodies 
Rabbit polyclonal antibodies against human TRF2 phosphorylated at 
Ser365 and mouse TRF2 phosphorylated at Ser367 were generated by 
Kaneka Eurogentec S.A. Biologics Division. The antibodies were raised 
against the phosphorylated human C-(PTQALPA[pS]PALKNKR)-N and 
mouse C-(ANLASPS[pS]PAHKHKR)-N TRF2 sequences conjugated 
through the added C-terminal cysteine to keyhole limpet hemocyanin 
(KLH). Phosphoserine 365- and 367-specific antibodies were purified 
with the use of the corresponding sulfolinked phospho- and unphos- 
phorylated peptides. The specificity of each antibody was confirmed 
by ELISA and immunoblot assays. 


Peptide synthesis and peptide pull-down experiments 

The peptide pull-down was carried out using the biotinylated pep- 
tides. In brief, 36 pg of each of the peptides was coupled to 40 pl of 
streptavidin-coated magnetic beads (Invitrogen) and added to 1 mg 
of nuclear extract of HEK 293 cells expressing pHAGE-HA-Flag-RTEL1. 
Nuclear extracts were precleared by incubation for 30 min at room 
temperature with uncoupled beads before pull-down incubation. The 
coupled beads and the lysates were incubated for 2h at 4 °C. The beads 
were washed four times with TBST (Tris-buffered saline, 0.1% Tween- 
20), resuspended in 2x SDS loading sample buffer, and boiled for 5 min. 


Slot-blot assay 

TRF2 peptides diluted into a final volume of 200 pl in SSC 2x were 
applied under gentle vacuum to Trans-Blot nitrocellulose membrane 
(Bio-Rad) using a Minifold 48 slots, Whatman apparatus (GE Health- 
care). Each well was washed with 200 ul aliquots of SSC 2x. After remov- 
ing SSC 2x with gentle suction, the membrane was removed from the 
apparatus and washed once with SSC 2x. The membrane was blocked 
at room temperature in a blocking buffer (5% BSA in TBST) for 1 hand 
probed with horseradish peroxidase (HRP)-conjugated anti-biotin 
antibody. Incubation was allowed to proceed for 1h at 4 °C with rocking. 
After incubation, the antibody solution was removed and the mem- 
brane rinsed twice with TBST followed by detection by the ECL method. 


Site-directed mutagenesis 
Aminoacid substitutions were performed with the primers as indicated 
in key resources table. Primers were designed with the QuikChange 


Primer Design Software (Agilent Technologies). Single mutants were 
generated using the QuikChange Lightning Site-Directed Mutagenesis 
kit (Agilent Technologies) and double mutants were created with the 
QuikChange Lightning Multi Site-Directed Mutagenesis kit (Agilent 
Technologies) according to the manufacturer’s instructions. The gener- 
ated mutants were verified by sequencing to screen against spurious 
secondary mutations. For primer sequences, see Supplementary Table 2. 


InsituPLA 

Cells were plated on coverslips at density 5 x 10* in 24-well plates and leftin 
culture conditions overnight. The next day, cells were pre-extracted in CSK 
buffer (10 mM PIPES, pH6.8, 100 mM NaCl, 300 mMsucrose,3 mM MgCl, 
ImMEGTA and 0.5% Triton X-100) fixed with 4% formaldehyde inthe CSK 
buffer for 10 min, permeabilized with PBS containing 0.5% (v/v) NP-40 for 
5min, and blocked 30 min with goat serum (5%) in PBS. PLA was performed 
following the manufacturer’s instructions using the Duolink anti-Mouse 
MINUS and anti-Rabbit PLUS In Situ PLA probes and the Duolink In Situ 
Detection Reagents Red (Olink Bioscience). Images were acquired with 
a Zeiss Axio Imager M1 microscope equipped with an ORCA-ER camera 
(Hamamatsu) and using the Volocity 6.3 software (Perkin Elmer). 


Cell cycle synchronization 

HEK 293 cells were synchronized by the double-thymidine-block 
method with minor modifications. In brief, cells were treated with 
2mM thymidine (Sigma-Aldrich) for 18 h, thymidine-free medium for 
9 htorelease the cells, and 2 mM thymidine was added to medium for 
anadditional 16 hto arrest the cells at the G1-to-S transition. Cells were 
washed twice with PBS and then released in fresh complete DMEM. Cells 
were analysed at 70-min time intervals by immunoblotting and in situ 
PLA assay. For synchronization in mitosis, a thymidine-nocodazole 
block was used. In brief, cells at a confluence of 60% were treated with 
2 mM thymidine for 24 h, washed twice in PBS, and released into 
complete DMEM for 3 h. Next, cells were treated with 50 pg ml of 
nocodazole (Sigma-Aldrich) for 15 h, and the cells were washed twice 
with PBS and a fresh complete medium was added to the cell culture. 
Synchronized cells were analysed at 90-min time intervals by western 
blotting with antibodies as indicated. 


Indirect immunofluorescence 

Cells were washed with PBS and fixed with 4% formaldehyde for 10 min 
at room temperature, permeabilized with 0.3% Triton X-100 in PBS for 5 
min at room temperature and then blocked with 3% BSA, 10% FBS in PBS 
for 1hat room temperature. Samples were then incubated with rabbit 
anti-53BP1 antibody overnight at 4 °C, washed with 0.05% Tween-20 
in PBS and incubated with anti-rabbit IgG AlexaFluor 594 (Molecular 
Probes). DNA was counterstained with DAPI and images were acquired 
using a Zeiss Axiolmager M1, using a Hamamatsu digital camera and 
the Volocity 4.3.2 software (Perkin Elmer). 


Airyscan super-resolution imaging 

Sample preparation for super-resolution microscopy, cross-linking 
efficiency determination and Airyscan imaging were performed as 
described previously”. 


Super-resolution microscopy scoring criteria 

Images obtained from super-resolution microscopy were scored as previ- 
ously described”. Specifically, after capture and processing, images were 
exported to Image) as .tif images with maintained scales. Images were 
manually quantified with researchers blinded to the experimental condi- 
tions. Telomere molecules were scored if they had a traceable telomere 
contour of >1 pm, and contained no gaps in telomere staining >500 nm. 
Molecules were classified as t-loops when we could discern an individual 
molecule consisting ofa closed loop structure witha single attached tail. 
Molecules were classified as linear when we observed an individual mol- 
ecule with two visible ends, containing no loops or branched structures. 


All molecules that did not conformto the looped or linear definition were 
classified as ambiguous. Densely packed areas of coverslips with overlap- 
pingtelomere molecules were not scored. Each loop and linear molecule 
were measured for contour length using the Image] trace function. 


PNA FISH and inmmuofluorescence FISH 

Telomeric PNA FISH on cytogenetic chromosome spreads was per- 
formed as previously described’. In brief, cells were treated with 
0.2 pg mI of colcemid for 90 min to arrest cells in metaphase. Trypsi- 
nized cells were incubated in 75 mM KCI, fixed with methanol:acetic 
acid (3:1), and spread ona glass slide. To preserve the chromosome 
architecture better, the slides were rehydrated in PBS for 5 min, fixed 
in 4% formaldehyde for 5 min, treated with 1 mg mI“ pepsin for 10 min 
at 37 °C, and fixed in 4% formaldehyde for 5 min. Next, slides were 
dehydrated in 70%, 85% and 100% (v/v) ethanol for 15 min each and 
air-dried. Metaphase chromosome spreads were hybridized with telo- 
meric TAMRA-TelG 5’-(TTAGGG),-3’ PNA probe (Panagene) and slides 
were mounted using ProLong Gold antifade with DAPI (Life Technolo- 
gies). Chromosome images and telomere signals were captured using 
Zeiss Axio Imager M1 microscope equipped with an ORCA-ER camera 
(Hamamatsu) controlled by Volocity 6.3 software (Improvision). For 
interphase immunofluorescence FISH (TIFs), cells grown on #1.5 glass 
coverslips were fixed for 20 min in 2% (w/v) formaldehyde (Thermo 
Scientific) at room temperature and immunofluorescence FISH was 
performedas previously described’, using primary 53BP1 antibody (see 
Supplementary Table 1), anti-mouse Alexa Fluor secondary antibody 
(Molecular Probes) and a TAMRA-TelG 5’-(TTAGGG),-3’ PNA probe 
(Panagene). Metaphase TIF assays were doneas previously described". 
In brief, cells were treated with 20 ng mI colcemid for 1h before collect- 
ing and resuspending ina hypotonic buffer of 0.2% trisodium citrate in 
0.2% KCI. The cells were swollen for 5 min then cytocentrifuged onto 
glass slides using a Tharmac Cellspin 1, before fixation and processed 
for immunolabelling with an anti-y-H2AX primary antibody and subse- 
quently with anti-mouse Alexa Fluor 568 secondary antibody (Molecu- 
lar Probes). After asecond fixation, the samples were hybridized with 
an Alexa Fluor 488 conjugated C-rich telomere PNA probe (Panagene), 
stained with DAPI and mounted with ProLong Gold (Molecular Probes). 
Automated metaphase finding and image capture was done as previ- 
ously described” using a MetaSystems imaging platform coupled with 
a ZEISS AxioImager Z.2 microscope using a 63x, 1.4 NA oil objective 
and appropriate filter cubes, and a CoolCubel camera (MetaSystems). 
After acquisition, images were imported into Image) (NIH) and Adobe 
Photoshop CSS for manual quantification and processing. 


Telomere circle assay 

Cells grown at a confluence between 70% and 80% were collected from 
two 10-cm dishes and extraction of genomic DNA for T-circle assay was 
performed as previously described’. Total gDNA was digested by Alul/ 
Hinfl restriction enzymes and the TCA assay was performed with two 
essential modifications as described’: (1) Phi29 DNA (Thermo Scientific) 
polymerization used a mammalian telomere primer; and (2) South- 
ern blotting membrane was hybridized to a y**P-labelled (TTAGGG), 
telomeric probe. Southern blot images were captured with Storm 840 
scanner and the extent of [*P] incorporation was quantified from the 
autoradiographs by ImageQuant TL Software Analyzer (Amersham 
Biosciences). The level of y[?P] incorporation obtained from the Phi29- 
negative control samples represented the background level, which was 
subtracted from the values obtained from the samples that contained 
the Phi29 DNA polymerase. 


DNA combing 

Cells were sequentially pulse-labelled with 25 uM CldU (Sigma) and 
250 uM IdU (Sigma) for 20 min and, after collection of the cells, low- 
melting agarose (Sigma) plugs each containing 200,000 cells were pre- 
pared. DNA fibres were extracted from the plugs and combed on silanized 
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coverslips using the FiberPrep DNA extraction kit and the molecular 
combing system (Genomic Vision), according to the manufacturer’s 
instructions. Combed fibres were fixed at 60 °C for 2h and DNA was 
denaturated in 0.5 M NaOH for 25 min. Fibres were then blocked in 1% 
BSA in 0.1% Tween-20 in PBS for 1h, incubated with rat anti-BrdU (detects 
CldU, BU1/75, AbD Serotec, 1:200) and mouse anti-BrdU (detects IdU, 
B44, BD Biosciences, 1:200) for 1h followed by anti-rat IgG AlexaFluor 594 
and anti-mouse IgG AlexaFluor 488 (Molecular Probes, 1:500) for 1.5 h. 
For DNA combing with single-stranded DNA (ssDNA) staining, cells 
were sequentially pulse-labelled with 25 1M CldU (Sigma) and 250 uM 
IdU (Sigma) for 15 min, and, after collection of the cells, low-melting 
agarose (Sigma) plugs each containing 250,000 cells were prepared. 
DNA fibres were extracted from the plugs and combed on silanized 
coverslips using the FiberPrep DNA extraction kit and the molecular 
combing system (Genomic Vision), according to the manufacturer’s 
instructions. Combed fibres were fixed at 60 °C for 2h and DNA was 
denaturated in 0.5 M NaOH for 25 min. Fibres were then blocked in 
1% BSA in 0.1% Tween-20 in PBS for 1h, incubated with rat anti-BrdU 
(detects CIdU, abcam, ab6326, 1:500) and mouse anti-BrdU (detects 
IdU, B44, BD Biosciences, 1:250) for 1h followed by anti-rat IgG Alex- 
aFluor 594 and anti-mouse IgG AlexaFluor 488 (Molecular Probes, 
1:500) for 1.5 h. Fibres were then incubated with mouse anti-ssDNA 
antibody (Millipore, MAB3034, 1:200) for 45 min followed by anti- 
mouse IgG AlexaFluor 647 (Molecular Probes, 1:200) for 45 min. 
Images were acquired using a Zeiss Axiolmager M1, equipped with 
Hamamatsu digital camera and the Volocity software (Perkin Elmer). 
Fibre length was analysed using Image) (http://rsbweb.nih.gov/ij/). 


Mass spectrometric analyses and protein identification 
Coomassie-stained polyacrylamide gel slices were excised from 
SDS-PAGE gels using a scalpel and processed for mass spectrometry 
using the Janus liquid handling system (PerkinElmer). In brief, the 
excised protein gel pieces were placed in individual wells of a 96-well 
microtitre plate and destained with 50% (v/v) acetonitrile and 50 mM 
ammonium bicarbonate, reduced with10 mM DTT, and alkylated with 
55 mM iodoacetamide. After alkylation, the samples were digested 
with trypsin (Promega), overnight at 37 °C. The resulting peptides were 
extracted in 1% (v/v) formic acid, 2% (v/v) acetonitrile. Digests were 
subsequently analysed by nano-scale capillary LC-MS/MS. Peptide 
mixtures were separated on a 50 cm, 75 umi.d. EasySpray C18 LC-MS 
column over a30-min gradient and eluted directly into the LTQ Orbitrap 
Velos (Thermo Scientific) mass spectrometer. The mass spectrometer 
was operated in data dependent mode with the top-10 most-intense 
multiply charged precursor ions fragmented in the linear ion trap using 
collision-induced dissociation. Raw mass spectrometric data were 
processed in MaxQuant’® (v.1.3.0.5) for peptide and protein identifica- 
tion, the database search was performed using the Andromeda search 
engine against the Homo sapiens canonical sequences downloaded 
from UniProtKB (release 2012 08). 


Statistical analysis 
Statistical analyses were performed using GraphPad PRISM version 7.0 
software (GraphPad). Statistical significance of data was assessed by 


two-tailed Student t-test or one-way ANOVA unless noted otherwise. 
Data represent mean +s.e.m. or mean +s.d. as indicated. P> 0.05 was 
considered not significant. No statistical methods were used to pre- 
determine sample size. The experiments were not randomized, and 
investigators were not blinded to allocation during experiments and 
outcome assessment unless stated otherwise. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


The mass spectrometry proteomics dataset is publicly available 
through ProteomeXchange Consortium via the PRIDE partner reposi- 
tory with the dataset identifier PXDO14843. Source Data for Figs. 1-4 
and Extended Data Figs. 1-8 are available with the online version of 
the paper. All other data are available from the corresponding author 
upon reasonable request. 
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Extended Data Fig. 1| TRF2 is phosphorylated at Ser365. a, Annotated 
spectrum for the TRF2 phosphorylated peptide. The data were acquired onthe 
LTQ Orbitrap Velos and processed in MaxQuant v.1.3.0.5 with the database 
search performed against the canonical sequences Homosapiens from 
UniProt. For the spectrum shown, the posterior error probability value was 
0.018258 and the localization score for the site was 1 DLVLPTQALPAS(1)PALK. 
b, HEK 293 cells were released froma double-thymidine block (left) ora 
thymidine plus nocodazole block (right). Cells were subjected to SDS-PAGE 
analysis and progression through the cell cycle was monitored by 
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immunoblotting with cell cycle markers as indicated. Asterisks indicate time 
points after synchronization. c, PCR analysis of genomic DNA isolated from 
Trf2‘ MEFs stably expressing empty vector, wild-type or mutant TRF2, 96h 
after infection with control or Cre-expressing adenovirus. d, Western blotting 
analysis of the cells described inc to monitor loss of endogenous TRF2 after Cre 
expression and to determine complementation efficiency with ectopic wild- 
type and mutant TRF2. The asterisk indicates endogenous TRF2. Ina-d, the 
experiments were independently repeated at least twice with similar results. 
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Extended Data Fig. 2| Mutations of TRF2 at the Ser365 or Ser367 phospho- 
site donot affect interaction with shelterin components. a, Whole-cell 
extracts from HEK 293 cells stably expressing empty vector, wild-type or 
mutant Myc-tagged TRF2 as indicated were immunoprecipitated with anti-Myc 
antibody or normal mouse IgG. Protein complexes were analysed with 
antibodies against RAP1, TRFland Myc. b, 7rf2 MEFs expressing wild-type or 
phospho-dead (Ser367Ala mutant) TRF2 were transfected with either control 
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siRNA (non-targeting control, NTC) or siRNA against RAPI (siRAP1) and treated 
with 4-OHT for 96h. Whole-cell extracts were analysed 72 h later as indicated. 
c, Quantification (left) and representative images (right) of chromosome 
fusions in the 7rf2‘" MEFs depicted in b performed 96 h after 4-OHT treatment 
(n=30 metaphases analysed). Data are mean +s.e.m. Pvalues were determined 
by one-way ANOVA. Ina-d, the experiments were independently repeated at 
least twice with similar results. 
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Extended Data Fig. 3 | Inhibition of MEK-ERK signalling pathway does not 
affect TRF2 phosphorylation at Ser365 or Ser367. a, Quantity screen for 
TRF2-biotinylated peptides. Slot-blot assay in which biotin-tagged TRF2 
peptides were incubated with streptavidin-coated beads to ensure that the 
correct amounts were used in the peptide pull-down assay. b, HEK 293 cells 
(left) or Rtel1* MEFs (right) were pre-treated with vehicle control (DMSO) or 
with 25 uM of MEK1/2 kinase inhibitor (U0126) for 48h. Whole-cell extracts 
were subjected to SDS-PAGE analysis followed by immunoblotting with 
antibodies as indicated. Inaandb, the experiments were independently 
repeated at least twice with similar results. 
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Extended Data Fig. 4 | Identification of TRF2- and RTEL1-interacting 
phosphatases and protein phosphatase regulatory subunits. a, Intensity- 
based absolute quantification (iBAQ) scatter plots comparing protein 
abundance in cells synchronized during S phase versus asynchronous control 
cells. Immunoprecipitates from asynchronous or S-phase-synchronized HEK 
293 cells stably expressing Flag-haemagglutinin (HA)-tagged RTEL1 (top), 
N-terminal FLAP (Flag—GFP)-tagged RTEL1 (middle) or Myc-TRF2 (bottom) 
were separated by SDS-PAGE and stained with Coomassie blue to visualize 
proteins. Immunoprecipitations with haemagglutinin (top), GFP (middle) and 
Myc (bottom) antibodies were performed. The proteins along the entire length 
of the gel were extracted and analysed by liquid chromatography-tandem 
mass spectrometry (LC/MS-MS). b, HEK 293 cells stably expressing wild-type 
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Myc-TRF2 were transfected with either non-target control or siRNA against 
protein phosphatase regulatory subunits, as specified. Three days later, 
protein levels were analysed with the indicated antibodies. c, FLAP-tagged 
RTEL1HEK 293 cells expressing Myc-tagged wild-type TRF2 were transfected 
with either control siRNA or siRNA against PP4R2 or PP6R3. Whole-cell extracts 
were immunoprecipitated with anti-Flag antibody andimmunocomplexes 
were analysed for Myc (TRF2) and Flag (RTEL1). Inputs (5%) are shown onthe 
right. d, HEK 293 cells expressing wild-type Myc-TRF2 (left) or Flag—HA-tagged 
RTEL1 (right) were subjected to immunoprecipitation with normal rabbit IgG 
or antibodies against PP4R2 and PP6R3. Immune complexes were analysed by 
western blotting with the indicated antibodies. Inb-d, the experiments were 
independently repeated at least twice with similar results. 
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Extended Data Fig. 5 | PP6R3 controls phosphorylation of TRF2 at Ser365 or 
Ser367. HEK 293 cells expressing wild-type Myc-TRF2 were transfected witha 
non-targeting control siRNA or siRNAs against protein phosphatase regulatory 
subunits (a) or catalytic subunits (b). Cells were collected, and whole-cell 
extracts were immunoprecipitated with anti-RTEL1 antibody. 
Immunocomplexes were resolved by SDS-PAGE and analysed by western 
blotting as indicated. c, HEK 293 cells (c) and 7rf2” MEFs (d) expressing Myc- 
tagged wild-type TRF2 were transfected with control siRNA or siRNA targeting 
PP4R2 or PP6R3 (Pp4r2 or Pp6r3 for MEFs). Whole-cell extracts were 
immunoprecipitated with anti-TRF2 antibody, and immunocomplexes were 
resolved by SDS-PAGE and analysed for human phospho-TRF2 (pS365 TRF2; 
left panel inc) or mouse phospho-TRF2 (pS367 TRF2; left panelind).e, Top, 
frequency of telomere loss and telomere fragility per metaphase in Rtel1’*” 
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MEFs transfected with control siRNA or with Pp4r2 or Pp6r3 siRNA (n=58 
(NTC), n=57 (Pp4r2), and n=55 (Pp6r3) of analysed metaphases). Efficiency of 
siRNA knockdown was determined by western blotting with PP6R3 and PP4R2 
antibodies as indicated. Data are mean+s.e.m. Pvalues determined by one-way 
ANOVA. Bottom, representative images of the telomere FISH experiments. The 
arrowheads show loss of telomere signal. Red, telomere PNA FISH; blue, DAPI. 
f, Phi29-dependent telomere circles (top) detected in cells as indicated ine. 
The extent of [*P] incorporation was quantified (bottom) fromthe 
autoradiographs, and the level of [?P] incorporation by cells transfected with 
control siRNA was arbitrarily assigned a value of 1OO%. Data are mean+s.d.and 
from two independent experiments. Pvalues determined by one-way ANOVA. 
Ina-f, the experiments were independently repeated at least twice with similar 
results. 
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Extended Data Fig. 6 | Replication defects in 7rf2” MEFs in the absence of 
TRF2 phosphorylation at Ser365 or Ser367. a, Quantification of global 
replication fork dynamics (left) and rates of replication fork progression 
(right) of the IdU/CIdU double pulse-labelling experiment in 7rf2” MEFs 
complemented with empty vector, wild-type or mutant TRF2, performed 96h 
after infection with control- or Cre-expressing adenovirus (n denotes number 
of analysed forks). Dataare mean +s.e.m. of triplicate experiments. Box plots 
areas in Fig. 3e. b, Representative images of the experiment froma. 
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c-e, Quantification of micronuclei (c; 500 nuclei per replicate), mitotic 
catastrophe (d; 500 nuclei per replicate), and 53BP1 foci frequency (e; 150 
nuclei per replicate) in 7rf2” MEFs complemented as ina. Dataare 

mean +s.e.m. of three (c, d) or two (e) independent experiments. f, DNA 
damage in Trf2‘” MEFs complemented as ina was estimated by counting the 
frequency of cells with five or more 53BP1 foci. For each independent 
experiment (n= 2), aminimum of 150 nuclei of each condition were analysed. 
All Pvalues were determined by one-way ANOVA. 
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Extended Data Fig. 7 | Suppression of constitutive binding of RTEL1 tothe 
TRF2(S367A) phospho-dead mutant rescues replication defects in MEFs. 

a, Quantification of rates of replication fork progression (left) and 
representative images (right) of the IdU/CIdU double pulse-labelling 
experiment in double-knockout 7rf2";Rtel1 mouse ear fibroblasts stably 
expressing Myc-TRF2(S367A), together with wild-type VS-RTEL1(WT) or C4C4 
mutant V5-RTEL1(R1237H) (R/H), performed 96h after infection with Cre- 
expressing adenovirus. Data are mean +s.e.m. of triplicate experiments. 

b, Quantification of replication fork dynamics (top) and fork asymmetry 
(bottom) from cells as ina. Staining with anti-ssDNA antibody (right) was used 


TRF2 S/A + RTEL1 WT 


TRF2 S/A+ RTEL1 R/H 


TRE2F/F « REL AFF 


Ad-Cre-GFP 


to exclude broken DNA tracks (n denotes number of analysed forks). Box plots 
areas in Fig. 3e. Dataare mean +s.e.m.c-e, Quantification of the frequency of 
micronuclei (c;500 nuclei per replicate), mitotic catastrophe (d; 500 nuclei per 
replicate), and 53BP1 foci (e; 150 nuclei per replicate) in 7rf2”Rtell* mouse ear 
fibroblasts complemented as indicated ina. Dataare mean+s.e.m. of three (c,d) 
or two (e) independent experiments. f, DNA damage in 7rf2*;Rtell mouse ear 
fibroblasts complemented as ina was estimated by counting the frequency of 
cells with five or more 53BP1 foci. For each independent experiment (n=2), 

a minimum of 150 nuclei of each condition were analysed. All Pvalues were 
determined by one-way ANOVA. 
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Extended Data Fig. 8 | TRF2(S367A) mutation induces TIFs and impairs 
formation of t-loops. a, Quantification (top) of the cross-linking efficiency test 
(bottom) in the 7rf2 MEFs stably expressing wild-type or S367A mutant 
Myc-TRF2120 hafter treatment with 4-OHT (n =3 independent biological 
replicates). Data are mean +s.e.m.b, Left, quantification of TIFs per interphase 
in 7rf2 MEFs complemented with empty vector, wild-type Myc-TRF2, 
phospho-dead mutant TRF2(S367A), or phospho-mimetic Myc-TRF2(S367D) 
and Myc-TRF2(S367E) mutants 96 h after infection with Cre-expressing 
adenovirus. Right, representative interphase TIF images. c, Quantification of 
TIFs per interphase in 7rf2"Rtel1'* mouse ear fibroblasts stably expressing 
Myc-TRF2(S367A) together with wild-type VS-RTEL1 or mutant V5-— 
RTEL1(R1237H) 96 hafter infection with GFP- or Cre-expressing adenovirus. 


d, Quantification (left) and representative images (right) of RPA staining at TIFs 
in Trf2“;Rtel1” mouse ear fibroblasts complemented as inc. Analysis was 
carried out 96h after infection with Cre-expressing adenovirus. Datainb-dare 
mean +s.d. from three independent experiments (n=100 cells in each 
treatment group analysed per independent experiment). e, Measurement of 
linear and t-loop molecules shown in Fig. 4c (n=3 biological replicates scoring 
21,192 molecules per replicate). T-loop measurements are asum of the loop and 
tail portions of the molecule. Data are mean +s.e.m. f, Measurement of the loop 
portion of t-loops from the experiments depicted in Fig. 4c (n=3 biological 
replicates scoring >1,192 molecules per replicate). All Pvalues were determined 
by one-way ANOVA. 
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Secondary active transporters, which are vital for a multitude of physiological 


processes, use the energy of electrochemical ion gradients to power substrate 
transport across cell membranes’”. Efforts to investigate their mechanisms of action 
have been hampered by their slow transport rates and the inherent limitations of 
ensemble methods. Here we quantify the activity of individual MhsT transporters, 
which are representative of the neurotransmitter:sodium symporter family of 
secondary transporters’, by imaging the transport of individual substrate molecules 
across lipid bilayers at both single- and multi-turnover resolution. We show that MhsT 
is active only when physiologically oriented and that the rate-limiting step of the 
transport cycle varies with the nature of the transported substrate. These findings are 
consistent with an extracellular allosteric substrate-binding site that modulates the 
rate-limiting aspects of the transport mechanism**, including the rate at which the 
transporter returns to an outward-facing state after the transported substrate is 


released. 


Secondary active transporters are integral membrane proteins that 
use electrochemical ion gradients to translocate substrates across 
cellular membranes to subserve numerous physiological functions, 
which range from nutrient uptake, ion homeostasis and antimicrobial 
efflux to neurotransmission in humans°®’. Neurotransmitter:sodium 
symporters (NSS) modulate synaptic activity by clearing the synapse 
of neurotransmitters’. Despite their physiological and medical impor- 
tance as molecular targets for therapeutic and recreational drugs®”, 
the individual steps of the transport cycle—and their regulation by 
neuromodulatory agents—remain poorly understood. 

Ensemble electrophysiological, radiotracer and fluorometry studies 
have shed light on the transport mechanism for specific secondary 
active transporters, including those in the NSS family’?’. Such meas- 
urements necessarily infer the rates of specific steps in the transport 
cycle using complex and potentially disruptive biochemical or physi- 
ological manipulations. Single-molecule methods have previously 
been used to examine dynamics in individual transporters associated 
with key conformational states visited during the transport cycle’*”. 
However, no method yet exists to monitor directly solute transport by 
secondary transporters at the single-molecule level?” ™. 

Here we report an approach based on single-molecule fluorescence 
resonance energy transfer (SmFRET) that provides precise measure- 
ments of the activities of single transporters, and which has the capacity 
to detect individual steps in the NSS transport cycle. This approach does 
not involve modifications of the transporter itself but instead reports 
the movement of the transported substrate molecules into the lumen 
of proteoliposomes, using an encapsulated sensor protein*®***. We 
demonstrate the utility of this method by investigating the prokaryotic 
NSS homologue MhsT, a hydrophobic amino acid transporter that 


exhibits transport rates similar to those of human NSS proteins’. Single- 
molecule studies revealed that MhsT efficiently catalyses substrate 
transport only when the transporter isin the physiological orientation 
inthe membrane. Single-turnover measurements revealed the kinetics 
of the first half-cycle of transport, which includes substrate binding, 
transporter isomerization and substrate release. Comparative kinetic 
analyses of single- and multi-turnover transport rates delineated the 
kinetics of the second half-cycle, which returns the transporter to the 
outward-facing state after release of substrate into the lumen. These 
measurements revealed that the rate of the return step, in which the 
transporter is classically modelled as devoid of substrate, is dependent 
on the identity of the substrate that is transported in the first half of 
the transport cycle. Consistent with these findings, we found that the 
time-averaged occupancy of the inward open state of MhsT in living 
cells was dependent on the identity of the substrate being transported. 
These insights provide compelling evidence that an allosteric ligand- 
binding site in the extracellular vestibule of MhsT°—and potentially 
also in related NSS homologues*”’—contributes to both the function 
and regulation of the transport cycle. 


Engineered amino acid sensors 


Bacterial periplasmic binding proteins help to support cellular growth 
and proliferation by scavenging substrates from the environment of 
the cell using a clamshell-like closure mechanism? 7°, To enable the 
detection of ligand binding via FRET, we attached donor (LD550) and 
acceptor (LD650)”? fluorophores ina site-specific manner within the 
opposing domains of the leucine, isoleucine, valine periplasmic bind- 
ing protein (LIV-BP)” of Escherichia coli via the engineered cysteine 


‘Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY, USA. Department of Structural Biology, St. Jude Children’s Research Hospital, Memphis, TN, USA. 
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Fig. 1| Design and characterization of a hydrophobic amino acid sensor. 

a, Apo (left) (RCSB Protein Data Bank code (PDB) 1Z15) and substrate (orange)- 
bound (right) (PDB 1Z16) crystal structures of LIV-BP. Locations of 
fluorophores are indicated by stars. b, Acceptor fluorescence of LIV-BP™" 
under donor excitation was measured in the presence of the indicated 
concentrations of isoleucine, leucine, valine, alanine, methionine, 
phenylalanine, glycine, proline, asparagine and tryptophan. c, Acceptor 
fluorescence of LIV-BP™' (WT) and LIV-BP®S (C53S/C78S) in the presence of the 


residues N67C and E181C (hereafter, LIV-BP™") (Fig. 1a, Supplementary 
Methods). 

We first characterized the binding properties of LIV-BP™’ using bulk 
fluorimetry methods (Supplementary Methods) and observed a robust 
40% increase in acceptor fluorescence upon addition of saturating 
ligand concentrations (Fig. 1b), consistent with the expected distance 
change upon ligand binding®. LIV-BP“ bound leucine, isoleucine and 
valine with nanomolar affinity, exhibiting dissociation constants (Kp) 
(approximately 40, 15, and 70 nM, respectively) that were independ- 
ent of buffer conditions (Fig. 1b, Extended Data Fig. 1, Supplementary 
Table 1). 

Given that 1substrate molecule within the lumen of a100-nm vesicle 
equates to approximately 3.5 uM** °°, a single substrate molecule is 
expected to saturate a single encapsulated LIV-BP™' sensor (Supple- 
mentary Note 1). To enable the detection of multiple transport cycles, 
we engineered alower-affinity sensor in which the two native cysteines 
within LIV-BP were mutated to serine residues (C53S and C78S) (here- 
after, LIV-BP*S). The LIV-BP®S sensor displayed similar ligand-depend- 
ent increases in FRET efficiency, but exhibited more than 100-fold 
decreases in leucine, isoleucine and valine binding affinities (S-30 LM) 
(Fig. 1c, Supplementary Table 1). 


Analogue and digital amino acid sensing 

We evaluated the rates of leucine binding to the LIV-BP™' and LIV- 
BP*S sensors by imaging the FRET behaviours of thousands of indi- 
vidual, surface-immobilized sensors simultaneously using wide-field, 
prism-based total-internal reflection fluorescence imaging’”’** (Sup- 
plementary Methods). In the absence of ligands and at a100-ms time 
resolution, bothsensors displayed a mean FRET efficiency of about 0.4 
and exhibited few (if any) spontaneous fluctuations in FRET (Extended 
Data Fig. 2a, b). Inthe presence of leucine, LIV-BP™' displayed discrete 
transitions toa higher FRET state (mean FRET efficiency of about 0.70), 
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indicated concentrations of leucine. Lines are fits to dose-response functions 
(Supplementary Table 1). d, FRET distributions at a range of nanomolar leucine 
concentrations for LIV-BP™". e, Single molecule trace of LIV-BP™' at 40 nM 
leucine and 25-ms time resolution. f, FRET distributions at arange of 
micromolar leucine concentrations for LIV-BP**. g, Single molecule traces of 
LIV-BPSS at 3.8 1M leucine and 100-ms (top) or 0.25-ms time resolution 
(bottom). All experiments were repeated at least three times with similar 
results. 


the occupancy of which increased as a function of substrate concentra- 
tion (Fig. 1d, e). By contrast, the LIV-BP*S sensor displayed a single peak 
inthe FRET histogram at all ligand concentrations, the mean value of 
which increased in an analogue fashion, reaching a plateau at about 
0.70 FRET at concentrations more than 10-fold above its K, (approxi- 
mately 50 pM) (Fig. 1f). Hypothesizing that this analogue FRET response 
reflected the time-averaging of rapid FRET fluctuations between open 
and closed states, we increased the time resolution of the imaging to 
below 1 ms, at which we observed the LIV-BP* sensor to exhibit two 
clearly defined low- and high-FRET states with absolute values that were 
indistinguishable from those exhibited by LIV-BP™' (Fig. 1g, Extended 
Data Fig. 2c, d). Detailed kinetic analyses revealed that the digital- and 
analogue-like FRET response behaviours of the LIV-BP™' and LIV-BP°S, 
respectively, reflect underlying distinctions in substrate off-rates at 
near-diffusion-limited substrate association rates (Extended Data 
Figs. 2e-j, 3a—c, Supplementary Note 2). 


Single-molecule transport by MhsT 

To measure leucine transport mediated by MhsT, we used a biotin- 
(NTA-Ni?") bridge” to surface-immobilize 100-nm proteoliposomes 
that were reconstituted under conditions that minimized the incor- 
poration of more than1 MhsT protein and LIV-BP® sensor in any single 
proteoliposome (Fig. 2a, Supplementary Methods). The encapsulation 
of LIV-BPSS had no measurable effect on sensor activity (Extended Data 
Fig. 3i,j). The subpopulation of proteoliposomes that contained both 
a single MhsT and a single LIV-BP®S was computationally selected on 
the basis of MhsT transport activity. 

Pre-steady-state transport measurements were initiated in surface- 
immobilized proteoliposomes by rapidly exchanging the external 
buffer solution with different pH, Na* and substrate concentrations, 
while imaging at 100-ms time resolution (approximately 10 times faster 
than the anticipated turnover rate). Accurate demarcation of transport 
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Fig. 2| Single-molecule, multi-turnover MhsT transport assay. a, Schematic 
of the single-molecule transport assay with liposomes containing lipids 
functionalized with 6x histidine residues and immobilized onto a PEG- 
passivated glass surface via streptavidin (SA)-biotin-(NTA-Ni”’), linker. 

b, Representative responding smFRET traces (grey) and ensemble-averaged 
FRET (red) from experiments imaging MhsT-containing proteoliposomes 

(pH 6 inside) with encapsulated LIV-BP* before and after (vertical dashed line) 
the external addition of 0.5 1M leucine and either 150 mM (+) or O mM (-) Na‘ at 
pH8. This experiment was repeated at least three times with similar results. 

c, Histogram of single exponential time constants from individual FRET 
traces (bars) fit to alog-normal distribution (black line). Simulated traces 
approximate the variation in rates caused by the liposome size distribution 


initiation was established by co-injecting a fluorescently labelled tracer 
(Extended Data Fig. 4, Supplementary Methods). Using an external 
solution containing 0.5 pM leucine and 150 mM Na’, we observed that 
individual LIV-BP*S sensors underwent a rapid increase in FRET that 
reached a plateau at a value that was consistent with leucine saturation 
(Fig. 2b, Extended Data Fig. Sa, c-e). No such increase was observed in 
the absence of Na’ (Fig. 2b). These data are consistent with the observed 
FRET change that arises from an encapsulated FRET sensor responding 
to a Na*-dependent concentrative translocation of leucine into the 
proteoliposome lumen over multiple transport cycles. 

We characterized the distribution of transport rates within the 
population of liposomes by fitting the individual FRET traces of 
FRET-responsive proteoliposomes to single exponential functions 
(Extended Data Fig. 5a). The distribution of time constants observed 
was well-described by a log-normal distribution (Fig. 2c); this is con- 
sistent with a relatively uniform transport rate of monomeric MhsT 
molecules, the variance of which closely recapitulated the expected 
distribution of extruded proteoliposomes sizes** *° (Fig. 2c). We 
therefore compiled all FRET-responsive proteoliposomes within each 
experiment to obtain a mean transport rate for the ensemble (red line 
in Fig. 2b), which we then converted to luminal substrate concentra- 
tions (Extended Data Fig. 1b, c). 
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(blue dashed line). d, Transport rate of liposomes containing one (light grey) or 
two (dark grey) labelled MhsT transporters. Similar results were observed over 
three experiments. e, Schematic of MhsT proteoliposomes immobilized inthe 
outside-out (left) or inside-out (right) orientations. f, Average FRET curves of 
transporters immobilized in the inside-out (I/O) orientation (S3C, red) and 
outside-out (O/O) orientation (N452C, black) before and after (vertical dashed 
line) the external addition of 2.5 uM leucine and 150 mM Na‘ at pH8. 

Mean +s.e.m. represented by shaded area (n=4 experiments). g, Radioactive 
leucine uptake using membrane vesicles of E. colicells expressing MhsT 

in outside-out (black) and inside-out (red) orientations. Data points represent 
mean +s.e.m. (n=3+ experiments). 


We confirmed that over 90% of surface-immobilized, responding 
proteoliposomes contained only asingle MhsT transporter using a fully 
active Cy7-labelled MhsT construct (Extended Data Figs. 6a, 7). The rate 
of Cy7-labelled MhsT transport closely mirrored that of the unlabelled 
molecule (Fig. 2d, Extended Data Fig. 9b). As expected, the minor sub- 
population of proteoliposomes that contained two Cy7-labelled MhsT 
transporters exhibited transport rates that were approximately twofold 
faster than those with a single transporter (Fig. 2d). 


MhsT transport is orientation-dependent 


The reconstitution of purified integral membrane proteins into prote- 
oliposomes can give rise to both inside-in and inside-out orientations of 
the protein*® “. The unimodal distribution of transport rates observed 
therefore suggests that: (1) only one MhsT orientation is present; (2) 
only one orientation is active; or (3) both MhsT orientations are present 
and transport at similar rates. To distinguish between these models, 
we immobilized 100-nm proteoliposomes reconstituted with MhsT 
mutants labelled with biotin-11-unit polyethylene glycol (PEGq,))- 
maleimide, which retained full transport activity (Extended Data Fig. 6a, 
b), to isolate inside-out (S3C mutant, residue facing the cytoplasm) and 
outside-out (N452C mutant, residue facing extracellularly) orientations 
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Fig. 3 | Single-molecule, single-turnover measurements of first half-cycle 
MhsT transport rates. a, Schematic of the single-turnover assay. b, Cumulative 
distribution (grey squares) from one replicate fit with a single exponential (red 
line) in 50 mM external Na’ at pH 8 and 0.1 1M leucine and internal pH 6. Similar 
results were observed over many experiments. Inset, representative smFRET 
traces (blue) and idealization (red) from experiments imaging MhsT 
proteoliposomes with encapsulated LIV-BP™' before and after the addition of 
0.1.M leucine and 50 mM Na‘ (grey dashed line). Traces are representative of 
experiments performed at least three times. c, First half-cycle rates with 
varying external ion and pH conditions with 0.1 1M leucine and pH 6 inside, 
demonstrating that all three conditions are statistically significantly different 
from the basal conditions (red). **P= 0.014, ***P< 0.0001. Leak, rate of passive 
substrate leak across the lipid bilayer. d, First half-cycle rates with varying 
internal conditions with constant external conditions: 0.1 4M leucine and 

50 mM Na‘ at pH 8. *P=0.048, **P= 0.019. The red and purple barsincand dare 
duplicated for comparison. Black diamonds represent individual data points, 
and bars represent mean +s.e.m. (n=3+ experiments). Significance was tested 
with atwo-sided Student's ¢-test that was not corrected for multiple 
hypotheses. 


(Fig. 2e). As expected from proteoliposome immobilization via the 
embedded transporter, we observed a 5-10-fold enrichment in the 
number of proteoliposomes that were transport-active (Extended 
Data Fig. 8a). 

Proteoliposomes bearing a single outside-out oriented MhsT 
protein exhibited transport rates that matched those found when 
proteoliposomes were immobilized via a Ni?*-NTA bridge (Fig. 2f, 
Extended Data Fig. 8a, Supplementary Methods). By contrast, pro- 
teoliposomes bearing a single inside-out oriented MhsT exhibited 
no detectable transport activity (Fig. 2f, Extended Data Fig. 8a). We 
found similar results when proteoliposomes that contain MhsT were 
immobilized using two additional cytoplasmic-facing residues, P86C 
and G87C (Extended Data Fig. 8b, c); these mutants also retained full 
activity in bulk mixed-orientation assays (Extended Data Fig. 6b). We 
verified that inside-out oriented MhsT transporters are effectively 
non-functional in transport by preparing membrane vesicles of 
defined orientation from E. coli heterologously expressing MhsT*"*? 
(Fig. 2g, Extended Data Fig. 6c, Supplementary Methods). By con- 
trast, anunrelated secondary active glycine transporter that is found 
natively in bacterial membranes*** was functional in membrane 
vesicles of both orientations (Extended Data Fig. 6d). We therefore 
conclude that MhsT efficiently catalyses transport only in the physi- 
ological (outside-out) orientation, and that proteoliposomes that 
bear inside-out oriented MhsT do not contribute to our transport 
measurements. 


Multi-turnover MhsT transport kinetics 


We next set out to determine the kinetic parameters of leucine trans- 
port by MhsT using the smFRET transport assay. In good agreement 
with ensemble measurements of the uptake of radiolabelled ligand, we 
found that the mean initial rates of MhsT transport varied as a function 
of external leucine concentration (Extended Data Fig. 9a, b) and that 
the catalytic turnover rate (K.,,) Was approximately 1.18 + 0.04. stand 
the Michaelis-Menten constant (K,,) was approximately 0.06 +0.01,1.M 
(Extended Data Fig. 9b). As expected, MhsT also exhibited transport 
rates that were strongly dependent on external Na‘ concentration with 
an apparent K,, of 53 + 4 mM (Extended Data Fig. 9c). Consistent with 
MhsT being a proton antiporter*, MhsT transport rates increased at 
higher external pH, and decreased with higher internal pH (Extended 
Data Figs. 9d, 10). 


Single-turnover MhsT transport 


To investigate the underlying mechanistic features of the secondary 
transport process, we isolated the first half-cycle of transport (from 
external substrate binding to internal substrate release) by perform- 
ing analogous experiments with the LIV-BP™' sensor encapsulated in 
the lumen of a100-nm proteoliposome, which is saturated after only 
a single transport event (Fig. 3a, Supplementary Note 1). Consistent 
with this expectation, the rapid addition of leucine and Na’ to surface- 
immobilized proteoliposomes that contain a single encapsulated 
LIV-BP™' triggered step-like increases in FRET from baseline (FRET 
of about 0.40) to that of a fully saturated sensor (FRET of about 0.70) 
(Fig. 3b inset, Extended Data Fig. 5b). Notably, such events occurred 
at variable time delays after the initiation of transport. As substrate 
mixing and substrate binding to the sensor are both rapid processes 
(about 100 ms) (Extended Data Figs. 3e-h, 4), we reasoned that the 
observed delay reflects the total time required for the transporter to 
bind the substrate in an outward-facing conformation, isomerize to 
the inward-facing state and release substrate into the proteoliposome 
lumen (Fig. 3a, b). After accounting for passive substrate leak across 
the lipid bilayer, which was two orders of magnitude slower than trans- 
port (Figs. 3c, 4a), we fit the cumulative distribution of the delay times 
between transport initiation and the observed FRET event to a single 
exponential process to estimate the time required for MhsT to transit 
its first half-cycle (Fig. 3b). 

At subsaturating external substrate concentrations (0.1 1M leucine 
and 50 mM Na’) and an outwardly directed pH gradient (pH 6 inside 
and pH 8 outside), we estimated the first half-cycle transport rate of 
MhsT to be about 0.62 + 0.08 s™ (Fig. 3c, Supplementary Note 1). As 
expected, substrate accumulation was not observed in the absence 
of Na* (Fig. 3c). The rate of single-turnover transport decreased by 
nearly an order of magnitude when the external pH was increased from 
6 to 8 (Fig. 3c). We infer from these findings that high external proton 
concentrations reduce Na’ and substrate binding to MhsT. 

In the related NSS-family transporter LeuT, internal Na* release 
is thought to precede substrate dissociation*”!. We therefore rea- 
soned that high luminal concentrations of Na* should reduce the 
first half-cycle transport rate. Consistent with this model, we found 
that luminal Na* (50 mM) decreased substrate release from MhsT 
by nearly an order of magnitude (Fig. 3d). When present at equal 
concentrations (50 mM) both inside and outside, residual transport 
remained approximately tenfold faster than passive leucine leak 
across the bilayer (Fig. 3d). Lowering the luminal proton concen- 
tration decreased the apparent first half-cycle transport rate even 
further (Fig. 3d). These data are consistent with Na*-facilitated sub- 
strate diffusion through MhsT (Extended Data Fig. 9e) and indicate 
that luminal protons can influence the interaction between Na’ and 
the transporter to modulate substrate release from the intracellular 
surface of MhsT. 
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Fig. 4 | Substrate identity affects the return rate of MhsT after substrate 
release to the proteoliposome lumen. a, Substrate titrations in 150 mM Na‘ at 
pH 8 of both single-round (darker line) and multi-round (lighter line) transport 
assays for leucine (left), isoleucine (middle) and valine (right), each fit witha 
Michaelis-Menten equation to obtain kinetic parameters as in Extended Data 
Fig. 9b for multi round and Fig. 3b for the single round. Individual points at 

2.5 UM indicate the rate of translocation in the absence of transporter inthe 


To examine whether Na’ leaves before H* binding or whether H* binds 
to the fully loaded transporter to actively facilitate Na* release, we 
increased the luminal pH in the absence of luminal Na’. Here the pres- 
ence of lower luminal proton concentration significantly decreased 
the single-turnover transport rate (Fig. 3d). We conclude from these 
findings that protons accelerate the release of substrate from MhsT, 
presumably by facilitating Na* release via the protonation of residues 
in proximity to the Na*-binding sites***”. 


Rate-limiting steps in MhsT transport 


Using the data from single- and multi-turnover transport assays, we 
set out to infer the rate of the second half of the transport cycle—the 
‘return step’ that restores the inward-facing, substrate-unloaded trans- 
porter to an outward-facing state—by comparing the maximal rates 
of each assay. Using the LIV-BP™ sensor, we measured the maximal 
first half-cycle leucine-transport rate (k,,,) to be about 12.0+0.10s7 
(Fig. 4a). By contrast, using the LIV-BP*’ sensor, the K,q, of multi-turnover 
leucine-transport rate was about 1.10 + 0.03 s“ (Fig. 4a). This notable 
distinction in single-turnover and multi-turnover rates implies that the 
return step, estimated here to occur at arate of approximately 1s", is 
rate-limiting to the overall transport cycle. Previous investigations have 
suggested that the return step after substrate release is rate-limiting to 
the NSS transport mechanism** °°. However, this suggestion appears 
to be incompatible with data that show that transporters (suchas the 
human serotonin transporter) catalyse the transport of distinct sub- 
strates (such as dopamine and serotonin) at markedly different rates™. 

To determine how different substrates affect rate-limiting steps 
of transport in MhsT, we took advantage of the broad specificity of 
MhsT for hydrophobic amino acids and the similar affinities of the 
LIV-BP sensors for leucine, isoleucine and valine (Fig. 1b, Extended 
Data Fig. 1c, Supplementary Table 1) to assess the single- and multi- 
turnover rates of these three substrates. In these experiments, MhsT 
exhibited first half-cycle and multi-turnover isoleucine-transport rates 
that were similar, but not identical, to those observed for leucine (Kk. 
of 8.14 +1.81sand1.51+0.05s", respectively) (Fig. 4a). These findings 
indicate that although the return step during isoleucine transport is 
somewhat faster than during leucine transport, this step is still rate- 
limiting for the full transport cycle. A nearly identical k,,, value for 
valine transport was observed with both single- and multi-turnover 
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single-round assay. Leak rates were 0.027 s“, 0.025s and 0.015 s" for leucine, 
isoleucine and valine, respectively. b, Normalized radioactive substrate uptake 
over 3s by F. coli heterologously expressing MhsT (A22C) after pre-incubation 
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(right) and NEM and fit with single exponentials. Data points represent 

mean +s.e.m. (n=3+ experiments). 


assays: 3.61 + 0.39 and 3.54 +0.14s“, respectively (Fig. 4a). This result 
implies that the return step is much faster, and no longer rate-limiting, 
when valine is the transported substrate. These findings could not be 
explained by differences in the rebinding rates of leucine, isoleucine 
and valine to the intracellular face of the transporter after luminal 
release (Extended Data Fig. 9f, Supplementary Note 4). These insights 
challenge canonical models of NSS transport in which the rate-limiting 
return step is defined as being substrate-free, and instead suggest that 
the rate-limiting return step inthe MhsT transport cycle is modulated 
by substrate after the transported substrate has been released. 


MhsT conformation in living cells 


The variable first and second half-cycle reaction rates evidenced for 
different substrates implies that the time-averaged distribution of 
MhsT conformations should vary predictably with the identity of the 
transported substrate. To test this hypothesis, we performed live-cell 
sulfhydryl accessibility experiments in which the membrane-permeant 
alkylating agent N-ethylmaleimide (NEM) was used to probe the cyto- 
plasmic accessibility of a cysteine residue (A22C) located within the 
translocation pathway of MhsT and the homologous NSS transporter, 
Tyt1. In these experiments, more time spent, on average, in the inward- 
facing state should result in faster NEM reactivity. As NEM labelling at 
this position blocks transport, increased occupancy of the inward- 
facing state will result in more-rapid inhibition of transport activity. 
Consistent with the predictions of our single-molecule transport assays, 
we found that the rates of MhsT inhibition by NEM pretreatment were 
fastest when transporting leucine, followed by isoleucine and an order 
of magnitude slower when transporting valine (Fig. 4b). These findings 
support our conclusion that the rate of the return step in the transport 
mechanism depends on the identity of the substrate being transported. 


Discussion 

Here, using an unmodified transporter and substrates, we have estab- 
lished a fluorescence-based transport assay with single transporter 
and single substrate sensing resolution to reveal previously inacces- 
sible aspects of the NSS transport mechanism. Using this approach, 
we observed that MhsT functions as amonomer, witha distribution of 
transport rates that is unimodal and homogenous. By leveraging the 


capacity to isolate the first half-cycle of transport, we further showed 
that pH has a critical role in modulating substrate binding to, and 
release from, MhsT. 

Determination of the full cycle and first half-cycle transport rates 
revealed that the return of the putatively unloaded transporter to the 
outward-facing state was dependent on the identity of the substrate 
being transported. This step has historically been difficult to discern 
experimentally, and to our knowledge has been inferred only from 
complex experiments and models 48°53, It is conceivable that 
different substrates are released from different configurations of the 
transporter, with different rates of intracellular gate closing and return 
to the outward-facing state. Such a mechanism would require confor- 
mational memory exerting effects on the time scale of a second, the 
turnover rate that we observed. We find our results more consistent 
with reports of asecond substrate (S2) binding site in the extracellular 
vestibule of MhsT° and the prokaryotic NSS homologue LeuT*™. In this 
framework, substrate binds in the S2 site to allosterically trigger inward 
opening and Na‘ and S1 substrate release, but remains trapped until 
the transporter isomerizes to the outward open state, thereby modu- 
lating the rate of the return step. We verified our model in living cells 
through cysteine accessibility measurements, which demonstrate that 
the time-averaged occupancy of the inward open state of MhsT during 
active transport is substrate-dependent and directly correlated with 
the return-step rate. In principle, such conformational information 
could be used by the cell to respond to different nutrients ina manner 
akin to signalling proteins” ®. 

Notably, substrate transport from the extracellular face of MhsT to 
the intracellular face was robust, whereas the ability of MhsT to move 
substrate from the inside face to the outside face (whether in prote- 
oliposomes or membrane vesicles) was negligible—despite favourable 
electrochemical gradients. Hence, in contrast to other secondary trans- 
porters, we conclude that MhsT can catalyse rapid substrate influx in 
the presence of appropriate electrochemical gradients, but not efflux, 
and that the S2 binding site at the extracellular face of the transporter 
is critical to vectorial substrate transport across the lipid bilayer. 

Notably, amphetamine-induced dopamine efflux by the dopamine 
transporter, amammalian NSS homologue, is dependent on phos- 
phorylation of the N terminus of the transporter, which sug- 
gests that the transmembrane domain alone is inefficient at mediating 
efflux. Although the underlying mechanism is at present unknown, 
this extended N-terminal segment is absent in MhsT, which appears 
incapable of efflux. 

The successful investigation of the secondary transport cycle using 
single-molecule techniques offers the potential to apply analogous 
methods to investigations of the multitude of primary and second- 
ary transporters. Such methods may be particularly advantageous for 
biophysical investigations of mammalian transporters, which are more 
prone to biochemical and functional heterogeneities that compromise 
quantitative ensemble investigations. Applications of this kind are, in 
principle, limited only by the ability to functionally reconstitute trans- 
porters into proteoliposomes and the availability of sensors specific to 
the substrates being transported. The latter of these challenges can be 
overcome by enlisting the diverse array of periplasmic binding proteins 
for unique substrates that exists in nature”*°, RNA aptamers®, receptor 
soluble domains” or synthetic engineering or evolution efforts that can, 
intheory, be implemented to recognize any arbitrary small molecule. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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Extended Data Fig. 2|See next page for caption. 
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Extended Data Fig. 2| Single-molecule imaging of LIV-BP™ and LIV-BP** 
reveals kinetic determinants of analogue and digital responses. 

a, b, Representative traces of LIV-BP™' (a) and LIV-BP*S (b) imaged at 100-ms 
time resolution in the absence of substrate. c, d, LIV-BP™' (c) and LIV-BP*S (d) 
variants imaged in the presence of leucine at the K, of each variant (40 nMand 
5.6 UM, respectively) and at the indicated time resolution in milliseconds. FRET 
values from all selected traces (count at top right of each panel) summed into 
time-dependent population FRET histograms, represented as contour plots. 
Scale bar is shown at the right. Two distinct FRET peaks are apparent at all time 
resolutions for LIV-BP™'; LIV-BP®S displayed only asingle peak at lowtime 
resolution (2100 ms) that resolved into two district populations inthe 
millisecond regime. e-h, LIV-BP™' and LIV-BP*S imaged at time resolutions that 
most-completely sampled the FRET transitions (25 ms and 0.25 ms, 
respectively) and idealized using the segmental k-means algorithm 
(Supplementary Methods). Dwell-time distributions of LIV-BP™' in the low- (e) 


and high-FRET (f) states, and LIV-BP* in the low- (g) and high-FRET (h) states, in 
the presence of the indicated concentrations of leucine. All experiments were 
performed at least three times with similar results. i,j, Rate constants derived 
from maximum likelihood analysis of dwell times in the low- (black squares) 
and high-FRET (red circles) states, fit to lines to determine ligand association 
(kon) (black lines) and ligand dissociation (K,¢,) (red lines) rate constants for LIV- 
BP" (i) and LIV-BP®S (j). As expected for a bimolecular interaction, the ligand- 
binding rate increased linearly with ligand concentrations, whereas the ligand- 
dissociation rate remained constant. k,, values were similar in LIV-BP™' and LIV- 
BPS at 77 and 30 UM's“, respectively, whereas Kr, values differed by nearly two 
orders of magnitude at 4.0 and 212s", respectively. Together, these results are 
consistent with the observed 100-fold difference in binding affinity in the two 
sensor variants (Fig. 1c). Allexperiments were performed at least three times 
with similar results. 
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Extended Data Fig. 3| LIV-BP rapidly binds and responds toligand. 

a, Schematic of His-tagged LIV-BP®S directly surface-immobilized. 

b, Representative fluorescence (donor in greenand acceptor in red) and FRET 
(blue) traces froma single LIV-BP** sensor imaged at 15-ms time resolution 
during the rapid delivery (vertical dashed line) of 10 pM (subsaturating) 
leucine. c, Ensemble-average FRET efficiency (symbols), fit toasingle 
exponential function witha time-constant of approximately 16 ms (red line). 
d, FRET values of an ensemble of surface-immobilized LIV-BP®s molecules 
summed into acontour plot (scale at right), demonstrating rapid and uniform 
response to leucine addition. e, Schematic of His-tagged LIV-BP™' directly 
surface-immobilized. f, Representative fluorescence (donor in green and 
acceptor in red) and FRET (blue) traces froma single LIV-BP™' sensor imaged 
at 10-ms time resolution during the rapid delivery (vertical dashed line) of 

3 uM (saturating) leucine. g, Ensemble-average FRET efficiency of leucine 
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(grey squares), isoleucine (red circles) or valine (blue triangles). Dataarefittoa 
single exponential function with a time-constant of approximately 27 ms for 
leucine (grey line), 25 ms for isoleucine (red line) and 32 ms for valine (blue line). 
h, FRET histogram of an ensemble of surface-immobilized LIV-BP“' molecules 
responding to leucine summed into a contour plot (scale at right), 
demonstrating rapid and uniform response to leucine addition. i, Schematic of 
LIV-BP™ encapsulated within the lumen of liposomes in identical conditions to 
those used for transport assays (Supplementary Methods). Liposomes were 
pre-incubated in100 pg ml‘ a-haemolysin for 15 min at room temperature. 

j, Encapsulated LIV-BP™' FRET response to injection of 3 1M leucine, in which 
the time of mixing is marked (dashed grey line). The time constant of the 
observed FRET response (about 23 ms), fit as above (green line), was identical 
to LIV-BP™' directly immobilized to the surface. All experiments were 
performed at least three times with similar results. 
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Extended Data Fig. 4 | Quantifying the timing of transport initiation. a, The 
precise timing of ligand injection was estimated by co-injecting alow 
concentration (0.5 nM) of Cy5-labelled 21-mer DNA duplex (Supplementary 
Methods) to the injected solution of interest, and measuring the increase in 
background fluorescence on the Cy5 channel in regions far from immobilized 
particles. The mean background fluorescence is shown, froma representative 
experiment in which LIV-BP“'-containing liposomes that lack MhsT were 
imaged before and after (vertical dashed line) the injection of Cy5-labelled DNA 
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tracer, with the exact time of injection determined by the midpoint of the step- 
like increase in fluorescence. Inset, zoomed-in view of the period immediately 
before and after injection. b,c, Representative single-molecule fluorescence 
(donor in greenand acceptor inred) and FRET (blue) traces (b) and FRET 
contour plot (c) of encapsulated LIV-BP™' from these experiments, 
demonstrating minimal changes in apparent FRET efficiency with co-injection 
of low concentrations of the Cy5 fluorophore. Experiments were performed at 
least three times with similar results. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Representative traces of multi-round and single 
turnover assays with simulations of multi-round activity. a, b, Single- 
molecule transport traces with 50 mM Na‘ and leucine (5 uM and 0.1M fora 
andb, respectively) at external pH 8 and internal pH 6, with the time of injection 
indicated by a dashed grey vertical line. a, Representative single-molecule 
fluorescence (top) (donor in greenand acceptor inred) and FRET (bottom) 
(blue) traces and fits to exponential functions (red) from experiments imaging 
LIV-BPSS encapsulated within proteoliposomes that contain MhsT. b, Asin 

a, but for the single-turnover assay using LIV-BP™' and state assignments in red 
(bottom panels). Traces shown are representative of experiments performed 
at least three times. c, Representative simulated FRET traces generated bya 
model in which distinct states correspond to the distinct FRET values that 


would be reported by the sensor, using the calibration curve according tothe 
number of substrate molecules in the liposome (Extended Data Fig. 1). We 
assume transport occurs at arate of one event per second and is irreversible, so 
transitions to lower-FRET states (state 1 to state O) are not allowed. d, Noise that 
mimics experimental noise was added to representative FRET traces (different 
individual traces to those shown inc), which masks the individual steps 
observable ina. This indicates that, while in ideal circumstances we should be 
able to monitor individual transport events in this assay, in practice such an 
analysis would be difficult. e, Experimental FRET traces with the same apparent 
transport rate as the simulated data demonstrate a notable likeness to the 
simulated traces. Over 1,000 traces were simulated with similar results and 
representative traces were taken from experiments repeated at least 3 times. 
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Extended Data Fig. 6 |?H-leucine uptake activity of wild-type MhsT, and 
immobilization mutants and membranes with and without MhsT. 

a, Turnover rates of purified and reconstituted wild-type MhsT (grey) and 
Cy7-labelled MhsT(S3C) (red) and MhsT(N452C) (blue) assessed at aseries of 
external leucine concentrations. Leucine uptake by wild-type MhsT exhibited a 
K,, of 0.93 + 0.08 1M anda V,,,, of 0.83 + 0.02 substrate molecules per second. 
MhsT(S3C) labelled with a Cy7 fluorophore had aK,, of 0.94 + 0.11p:M anda V,,,, 
of 0.82 + 0.03 substrate molecules per second. MhsT(N452C) labelled witha 
Cy7 fluorophore had aK,, of 0.90 + 0.08 pM anda VV,,,, of 0.84 + 0.02 substrate 
molecules per second. b, Leucine uptake by MQ614 cells expressing wild-type 
MhsT (grey), MhsT(P86C) or MhsT(G87C) in the presence of 150 mM Na’ at 
pH8.5.Mean+s.e.m. (n=3 experiments). c, Inside-out and outside-out vesicles 
prepared (Supplementary Methods) and assayed for uptake activity in the 
presenceand absence of the MhsT transporter as indicated. Transportis 
observed only in the presence of MhsT in the outside-out orientation. 

d, Radioactive glycine uptake by the native CycA glycine:H* symporters in 
inside-out or outside-out prepared vesicles in the presence of an inwardly 
directed proton gradient. Both preparations showrobust activity, indicating 
that both vesicle preparations are intact and contain functional transporters. 
Lactic acid was added (arrow) to vesicles during the glycine-uptake time course 
to establisha proton gradient relative to the vesicle orientation. This creates an 


inwardly directed proton gradient in outside-out vesicles, and the opposite in 
inside-out vesicles. We observe an increase in the rate of glycine transportin 
outside-out vesicles anda marked decrease in inside-out vesicles, as expected, 
which demonstrates that we have prepared the vesicles in the indicated 
orientation. Mean+s.e.m. (n=3 experiments).e, The V,,,, and K,, of leucine 
uptake by wild-type MhsT were measured at a series of external leucine 
concentrations for the indicated periods of time. Assays were performed with 
proteoliposomes that contain known amounts of MhsT prepared at protein-to- 
lipid reconstitution ratios of 1:150 (w/w) (solid symbols) or 1:300 (w/w) (open 
symbols) for time periods of 2, 3,5 or10s (total decays per minute at the 
corresponding time points were background-corrected for decays per minute 
determined at Os). The partially filled square indicates the virtual overlap of 
data points. The specific radioactivity-decays per minute correlation was 
verified using known amounts of *H-leucine. Shorter sampling times yielded 
higher turnover rates and lower K,, values (the highest V,,,,, of 1.04 +0.02s7and 
lowest K,, of 0.21+ 0.01 .M were determined at the 2-s sampling time), but the 
technically challenging nature of these experiments precluded further 
shortening of the sampling time. To ensure the reliable determination of the 
Vinax and K,, in radiolabelled uptake measurements, asampling time of 3s was 
chosen. Mean +s.e.m. (n=3 replicates of 2 protein preparations). 
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Extended Data Fig. 7 | Determination of the number of transporters per 
liposome. a, The number of Cy7-labelled transporters observed in each 
liposome. Reported values were corrected for a labelling efficiency of 75% to 
determine the final estimate of the number of transporters per liposome. Black 


symbols represent individual data points; bars represent mean +s.e.m. 

(n=3 experiments). b, c, Representative Cy7 fluorescence (left) and FRET 
(right) traces of aliposome with a single transporter (b) or two transporters (c). 
Traces are representatives of experiments performed at least three times. 
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Extended Data Fig. 8| See next page for caption. 
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Extended Data Fig. 8 | Orientation-controlled single-turnover transport. 

a, Schematics and corresponding contour plots of the single-turnover assay 
with liposomes immobilized by the S3C (inside-out) or N452C (outside-out) 
residues of MhsT. The third panel from the left shows the passive diffusion of 
leucine in the absence of Na’. The right panel shows transport data from 
proteoliposomes that contain a single (mixed orientation) MhsT transporter 
immobilized by His-tagged lipids. Occupancy of the high-FRET state following 
injection of leucine (grey dashed line) represents the proportion of vesicles in 
which transport has occurred. The Nintop right corner indicates the total 
number of traces recorded over three separate experiments. b, Left, schematic 


of proteoliposome immobilization by biotin tags added at position P86C or 
G87C of MhsT, which result in inside-out orientations of the MhsT transporter 
(as for S3C). Right, contour plots of the single-turnover assay when 
immobilizing via biotin-P86C or biotin-G87C. The N intop right corner 
indicates the total number of traces over three separate experiments. 

c, Cumulative distributions of representative movies demonstrating the low 
translocation rates of the S3C (inside-out) immobilized liposomes with all 
three substrates, which match the leak rate in the absence of Na‘. Experiments 
were repeated at least three times with similar results. 
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Extended Data Fig. 9 | Michaelis-Menten kinetic parameters of MhsT 
transport. a, FRET values were transformed into units of proteoliposome 
luminal leucine concentration (Supplementary Methods) and fit to linear 
functions (lines) to determine transport initial rates, from experiments inthe 
presence of the indicated concentrations of external leucine. Unless otherwise 
stated, experiments were performed with external 150 mM Na‘, 141M leucine 
and pH 8 outside with pH 6 inside, and with choline chloride used to maintain 
osmotic balance. b, Substrate accumulation rates froma (black squares) 

were fit toa Michaelis-Menten function (black line) with aK, of O.06+ 0.01 1M 
anda V,,,, of 1.18 + 0.04. s‘ witha Hill coefficient of 0.69 + 0.08. Analogous 
radioactive uptake experiment (red circles) fit with a Michaelis-Menten 
function (red line) with aK,, of 0.19 + 0.02 uM anda V,,,, of 0.99 + 0.03s ‘witha 
Hill coefficient of 1.85 + 0.28.c, Asinb, but varying the external concentration 
of Na*. Black line is a fit to a Hill equation witha K,, of 69.9+ 4.25 mM anda V,,,, of 
1.14+0.05s‘witha Hill coefficient of 2.03 + 0.17. Analogous radioactive uptake 
experiment (red circles) fit with a Hill equation with aK,, of 63+15 mM, V,,,, of 
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0.74+0.11 uM s“and Hill coefficient of 1.614 0.40. d, Substrate accumulation 
rates in the presence of the indicated external pH (dark grey, pH 6 inside) and 
internal pH (light grey, pH 8 outside) and 150 mM Na‘ and 1pM leucine. The dark 
grey pH 8 bar and the light grey pH 6 bars are duplicated for comparison 
purposes. e, Substrate accumulation rates with 0.1 1M leucine pH 8 and50 mM 
Na* external solution and in the presence and absence of 50 mM Na‘ at pH 6and 
pH 8 of the internal solution. Robust transport is observed at bothinternal pH6 
and pH 8 inthe absence of internal Na’ (red). Addition of internal Na* 
completely abolishes transport (purple), with changes in internal pH having 
little to no further effect (cyan). Mean +s.e.m. (n=3+ experiments). f, Multi- 
round transport assay with 150 mM external Na’, 1 .M leucine and pH 8 with 
(red) and without (black) internal SO 1M tryptophan at pH 6. Thereis no 
significant difference between the two conditions when tested using a two- 
sided Student’s t-test that was not corrected for multiple hypotheses at a 95% 
confidence interval. Black diamonds represent individual data points, and bars 
represent mean +s.e.m. (n=3 experiments). 
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Extended Data Fig. 10 | Proton leak into proteoliposomes monitored by 
lipid-linked pHrodo fluorophores. Buffer was exchanged from pH 6 to pH8 
(outside) and a step-like approximately 50% decrease in fluorescence was 
observed, which we interpret as the quenching of lipid-linked pHrodo on the 
liposome exterior, followed by a slow decay in fluorescence (black squares). 
The data were fit with a single exponential function (red line) with atime 
constant of about 33s, which we interpret as the slow leak of protons into the 
liposome. The gaps in the data are periods in the absence of illumination, to 
rule out photobleaching as a cause of the fluorescence decay we observed. 
These experiments confirm that the pH gradient was maintained during the 
full time-frame in which we measure transport. Experiments were performed 
three times with similar results. 
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Data collection Commercial products were used (Labview 2017) 


Data analysis All software are commercial products (OriginPro 8, ImageJ 1.4, and 
MATLAB R2018b) or otherwise publicly available (SPARTAN). 
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Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size Sample sizes of 500-1,000 molecules per condition/repeat were determined by the normal throughput of the instrumentation and were 
found to adequately sample the distribution of behaviors in the ensemble. Statistical methods were not used to determine sample sizes. 
Standard replicate sizes in the field were used (n=3+) 


Data exclusions — Exclusion criteria to remove clear artifacts in single molecule traces were predefined 
and uniformly applied, as described in the manuscript. 


Replication All findings were reliably replicated on separate days with fresh buffer solutions and frozen sample aliquots from multiple preparations. All 
results shown were successfully replicated. 
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Randomization Samples were not randomized as the experimental procedure is systematic, the selection criteria were uniform across experiments and data 
analysis was largely automated and therefore identical across replicates. Additionally, the large number of total replicated performed in the 
study (300+) would have made randomization infeasible. 


Blinding Blinding was not performed as the experimental procedure is systematic, the selection criteria were uniform across experiments and data 
analysis was largely automated and therefore identical across replicates. Additionally, the large number of total replicated performed in the 
study (300+) would have made randomization infeasible. 
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The cytochrome b, f(cytb,f) complex has a central role in oxygenic photosynthesis, 
linking electron transfer between photosystems I and II and converting solar energy 
into atransmembrane proton gradient for ATP synthesis’ *. Electron transfer within 
cytb,f occurs via the quinol (Q) cycle, which catalyses the oxidation of plastoquinol 
(PQH,) and the reduction of both plastocyanin (PC) and plastoquinone (PQ) at two 
separate sites via electron bifurcation’. In higher plants, cytb, f also acts as a redox- 
sensing hub, pivotal to the regulation of light harvesting and cyclic electron transfer 
that protect against metabolic and environmental stresses’. Here we present a3.6A 
resolution cryo-electron microscopy (cryo-EM) structure of the dimeric cytb,f 
complex from spinach, which reveals the structural basis for operation of the Q cycle 
and its redox-sensing function. The complex contains up to three natively bound PQ 
molecules. The first, PQI, is located in one cytb, f monomer near the PQ oxidation site 
(Q,) adjacent to haem b, and chlorophyll a. Two conformations of the chlorophyll a 


phytyl tail were resolved, one that prevents access to the Q, site and another that 
permits it, supporting a gating function for the chlorophyll a involved in redox 
sensing. PQ2 straddles the intermonomer cavity, partially obstructing the PQ 
reduction site (Q,,) onthe PQ1 side and committing the electron transfer network to 
turnover at the occupied Q, site in the neighbouring monomer. A conformational 
switch involving the haem c, propionate promotes two-electron, two-proton 
reduction at the Q, site and avoids formation of the reactive intermediate 
semiquinone. The location of a tentatively assigned third PQ molecule is consistent 
with a transition between the Q, and Q, sites in opposite monomers during the Q 
cycle. The spinach cytb, f structure therefore provides new insights into how the 
complex fulfils its catalytic and regulatory roles in photosynthesis. 


Photosynthesis sustains life on Earth by converting light into chemical 
energy inthe form of ATP and NADPH, producing oxygen as a by-prod- 
uct. Two light-powered electron transfer reactions at photosystems 
land II (PSI and PSII) are linked via the cytb, f complex to form the 
‘Z-scheme’ of photosynthetic linear electron transfer (LET)’. Cytb,f 
catalyses the rate-limiting step in LET, coupling the oxidation of PQH, 
and reduction of PC and PQ tothe generation of atransmembrane pro- 
ton gradient, which is used by ATP synthase to make ATP*?. The cytb, f 
complex is analogous to the cytbc, complex found in mitochondria* 
and anoxygenic photosynthetic bacteria®, and both operate via the 
modified Q cycle**. The cytb, f and cytbc, complexes are dimeric and 
have similarly arranged electron transfer cofactors, comprising a 2Fe-2S 
cluster, two b-type haems and ac-type haem. However, crystallographic 
structures of cyanobacterial and algal cytb, f complexes have revealed 
additional cofactors that are not found in cytbc, complexes, includ- 
ing chlorophyll a, 9-cis B-carotene and an additional c-type high-spin 


haem’ °. The Q-cycle involves bifurcated transfer of two electrons, 
derived from oxidizing a lipophilic PQH, molecule at the Q,-binding 
site, into the high- (2Fe-2S, cytf) and low- (cytb,, b, and c,) redox poten- 
tial pathways, whereas the two protons enter the thylakoid lumen”*. The 
high-potential pathway delivers an electron to a membrane-extrinsic 
soluble acceptor protein, PC, destined for PSI, while the low potential 
pathway delivers its electrontoaPQ molecule bound at the Q, site near 
the stromal side of the membrane. Oxidation of a second PQH, at the 
Q, site culminates in the two-electron reduction ofa Q, site bound PQ, 
which together with two proton transfers from the stroma, regenerates 
PQH.,. The Q-cycle thereby doubles the number of protons transferred 
tothe lumen per PQH, oxidized**. Yet, full understanding of the Q-cycle 
mechanism is hindered by a lack of information on the binding of the 
substrate PQ/ PQH, molecules within the complex. 

In addition to its role in LET, cytb, f also plays a key part as a redox 
sensing hub involved in the regulation of light harvesting and cyclic 
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Fig. 1|Cryo-EM structure of the cytb, f complex from spinach. a-c, Views of 
the colour-coded cytb, fdensity map showing cytb, (green), cytf(magenta), ISP 
(yellow), subunit IV (cyan), PetG (grey), PetM (pink), PetN (pale orange) and PetL 
(pale purple). Detergent and other disordered molecules are shown in semi- 
transparent light grey. a, View in the plane of the membrane. The grey stripe 
indicates the probable position of the thylakoid membrane bilayer. b, View 


electron transfer (CET), which optimize photosynthesis in fluctuating 
light environments’. Cytb, f communicates the redox status of the PQ 
poolto the loosely associated light harvesting complex II (LHCII) kinase, 
STN7” ™. Phosphorylation of LHCII results in a decrease in thylakoid 
membrane stacking, promoting the exchange of LHCII between PSII 
and PSI to balance their relative excitation rates’ and regulate CET’. 
CET involves the reinjection of electrons from ferredoxin into the PQ 
pool, generating a proton gradient for photoprotective downregulation 
of PSI and PSII or to augment ATP synthesis, without net formation of 
NADPH". The cytb,f complex has been proposed to fulfil the role of 
the ferredoxin-PQ oxidoreductase (FQR) during CET, with the stromal- 
facing haemc, suggested to channel electrons from ferredoxin NADP* 
reductase (FNR) bound ferredoxin to the Q,-site PQ”. How cytb, f per- 
forms these central redox-sensing regulatory roles remains unclear. 

Genetic manipulation of photosynthetic regulation is now recog- 
nized as being key to increasing crop yields to feed a global population 
projected to approach 10 billion by 2050". Indeed, overproduction of 
the Rieske iron-sulfur protein (ISP) of cytb, fin Arabidopsis thaliana 
led to a51% increase in yield”. Further progress in understanding the 
regulatory roles of cytb, f and potentially manipulating them for crop 
improvement requires knowledge of the structure of the higher plant 
complex. Here, using a gentle purification procedure to obtaina highly 
active dimeric complex (Extended Data Fig. 1) and single-particle cryo- 
EM, we resolve the cytb, f complex from Spinacia oleracea (spinach) at 
3.6 A resolution (Extended Data Fig. 2, Extended Data Table 1). 

The colour-coded map (Fig. la—c) shows the architecture of this 
dimeric complex surrounded by a disordered density comprising deter- 
gent and lipid molecules. The overall organization of this higher plant 
cytb, f complex is similar to crystallographic structures of algal and 
cyanobacterial complexes from Chlamydomonas reinhardtii’ (Protein 
Data Bank (PDB): 1Q90), Mastigocladus laminosus’ (PDB: 1VF5) and Nos- 
toc sp. PCC 7120° (PDB: 2ZT9) (Extended Data Table 2). Each monomeric 
unit of the cytb,f complex comprises four large polypeptide subunits 
that contain redox co-factors (cytf, cytb,, ISP and subunit IV), and four 
small subunits (PetG, PetL, PetM and PetN). Extended Data Figure 3 
shows the density and structural model for each subunit. The extrinsic 
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perpendicular to the membrane plane from the lumenal (p) side. c, View 
perpendicular to the membrane plane from the stromal (n) side. d-f, Modelled 
subunits of cytb,fshownina cartoon representation and coloured asina-c.d, 
View in the plane of the membrane. e, View perpendicular to the membrane 
plane from the lumenal side. f, View perpendicular to the membrane plane from 
the stromal side. 


domains of cytf and the ISP on the lumenal face of the complex flank 
the membrane- integral cytb, subunits (Fig. 1a, b). The organization of 
the transmembrane integral subunits can be seen on the stromal side 
of the complex (Fig. 1c), with 13 transmembrane helices visible within 
each monomer (Fig. 1d-f). Peripheral to the core of cytb, (four trans- 
membrane helices) and subunit IV (three transmembrane helices) on 
the long axis of the complex is the single kinked transmembrane helix 
of the ISP that crosses over to provide the soluble ISP domain of the 
neighbouring monomer. The single transmembrane helix belonging 
tocytfis sandwiched by the transmembrane helices of the four minor 
subunits PetG, PetL, PetM and PetN, which form a ‘picket fence’ at the 
edge of the complex. 

Figure 2a, b shows the organization of the prosthetic groups and 
lipids, with four c-type haems (fand c,, dark blue), four b-type haems 
(b, and b,, red), two 2Fe-2S clusters (burnt orange and yellow), two 9-cis 
B-carotenes (orange), two chlorophyll a molecules (green), three PQ 
molecules (yellow) and twelve bound lipids (two monogalactosyl dia- 
cylglycerol, four phosphatidylglycerol, three sulfoquinovosyl diacyl- 
glycerol and three phosphatidylcholine, all shown in white). Extended 
Data Figure 4 shows the density map and structural model for each 
prosthetic group. Figure 2c shows all of the bound electron transfer 
cofactor edge-to-edge distances within the cytb, f complex. Electron 
transfer from the 2Fe-2S cluster is thought to involve movement of the 
lumenal ISP domain, pivoting between closer association with the Q, 
site and the haem. In comparison to the chicken cytbc, complex, in 
which the two conformations of the ISP were resolved”, the ISP and 
bound 2Fe-2S cluster in the spinach cytb, f structure appear to be in 
the distal position with respect to haem/, as in the existing algal and 
cyanobacterial cytb, f structures (Extended Data Table 2). PQlocations 
are generally inferred from crystallographic structures containing 
tightly bound quinone analogue inhibitors” ™. The cryo-EM structure 
was obtained with native PQ molecules (Fig. 2d), clearly defined by 
their respective densities (Extended Data Fig. 4); their distances from 
the nearest cofactors are shown in Fig. 2e-g. One PQ molecule (PQ1) 
is adjacent to the haem 5, and chlorophyll on one side of the dimer 
(Fig. 2e), and a second (PQ2) binds adjacent to the haem c,-haem b, 


Fig. 2| The global arrangement of prosthetic groups, lipids and 
plastoquinone molecules in the spinach cytb, f complex. a, b, The 
arrangement of molecules in the cytb, fcomplex viewed in the membrane 
plane (a) and perpendicular to the membrane plane from the stromal side (b). 
Chl, chlorophyll a; b,, haem b,;c,, haem c,; b,, haem b,; B-car, 9-cis B-carotene; 
FeS, 2Fe-2S.c,d, Cofactors and edge-to-edge distances (in A) inthe dimeric 
cytb,f complex. e, The location of the 1,4-benzoquinone ring of PQl adjacent 
to haem b,, the 2Fe-2S centre and two conformations of the chlorophyll 


pair on the opposite monomer to PQI (Fig. 2f). A third and less clearly 
defined PQ (PQ3) lies between the haem c, of one monomer and the 
haem, of the other (Fig. 2g). The density map inthis region could also 
be interpreted as a phospholipid; Extended Data Fig. 6 shows the two 
possible fits—to a plastoquinone or a lipid. 

The 1,4-benzoquinone ring of PQ1is 16.2 A from haem b, and 26.4 A 
from the 2Fe-2S cluster (Fig. 2e), and distal to the Q, quinone oxidiz- 
ing site defined in the M. laminosus cytb,f structure” (PDB: 4H13) by 
the inhibitor tridecylstigmatellin (Fig. 3a, b). The Q, site is locatedina 
pocket formed by hydrophobic residues from subunit IV (Val84, Leu88, 


molecule, represented in two shades of green. f, Close-up of the 
1,4-benzoquinone ring of PQ2 and the nearby haemc, and haem), near the 
stromal face of the complex. g, The 1,4-benzoquinone ring of PQ3, which sits 
between the haemc, and haem b, from thetwocytb,fmonomers. Thecytb,f 
complex is coloured as in Fig. 1,and shows c-type haems (fandc,, dark blue), 
b-typehaems (6, and b,, red), 9-cis-B-carotene (orange), chlorophyll a (green), 
2Fe-2S (burnt orange and yellow), lipids (white) and plastoquinones (PQ1I-PQ3; 
yellow). 


Val98 and Met101) and cytb, (Phe81, Val126, Ala129, Val133, Val151 and 
Val154) (Fig. 3c). Bifurcated electron transfer to the 2Fe-2S cluster and 
haem b, involves two deprotonation events mediated by the ISP His128 
and subunit IV Glu78 residues”’, which are buried inside the Q, pocket 
(Fig. 3a, b). The OH group of PQ1is -26 A from His128, a ligand of the 
2Fe-2S cluster (Fig. 3b), so PQLis unlikely to be oxidized in its resolved 
position, which probably represents a snapshot of its approachto the 
Q, site. It is notable in this regard that our spinach cytb, f structure 
resolves two conformations of the chlorophyll phytyl tail, one of which 
permits access to Q, site and one that restricts it (Fig. 3c, d). There is 


Fig. 3 | Conformational alterations in the chlorophyll phytyl chain at the 
PQH,-oxidizing Q, site. a, Orientation of the PQ1in relation to the haem b,, 
chlorophyll and 2Fe-2S cofactors. The catalytically essential residue E78 and 
coordinating residues of the 2Fe-2S cofactor are shown. Tridecylstigmatellin 
(TDS) is a quinone analogue, superimposed according toits position 
determined in the cyanobacterial complex (PDB: 4H13)”’, and used here to 
indicate the probable destination of PQ1in the Q, pocket. b, The same 


cofactors and residues as ina, but in relation to a surface view of cytb, (green) 
and subunit IV (sub IV, cyan). c, The Q, pocket is highlighted with a red dashed 
line, showing its position in relation to the chlorophyll and PQ1 molecules; the 
hydrophobic residues of subunit IV (cyan) and cytb, that line the pocket are 
shownas sticks and coloured cyan and green, respectively. d, The two 
conformations of the chlorophyll tail (represented in dark green and light 
green) gate (dashed arrow) the entrance to the Q, pocket. 
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Fig. 4| The intermonomer cavity of the spinach cytb, f complex. a, b, Surface 
representations of the complex, with subunits coloured as in Fig. 1,and 
cofactors and lipids coloured as in Fig. 2. These two views of the complex are 
related by a45° rotation about an axis perpendicular to the membrane to show 
two views of the cavity and the locations of PQ molecules. c, PQI-PQ3 are 
shownin relation to the b,,c, and b, haems in the core of the complex, viewed in 
the membrane plane. d, The complex viewed from the stromal side of the 


only one position of the phyty! tail for the chlorophyll onthe opposing 
monomer. The bound chlorophyll adjacent to PQI may fulfil a gating 
function at the Q, pocket, either controlling access of PQH, and/or 
increasing the retention time of the reactive semiplastoquinone (SPQ) 
intermediate species formed following electron transfer to the 2Fe-2S 
cluster. Indeed, spin-coupling between the SPQ and the 2Fe-2S cluster 
has been detected during enzymatic turnover of cytb, f but is absent 
in cytbc, complexes that lack the chlorophyll molecule”. SPQ in the 
2Fe-2S-bound state does not react with oxygen, providing a potential 
mechanism to control the release of superoxide from the Q, site** and 
regulate the activity of the LHCII kinase STN7”, which is proposed to 
bind to cytb, f between transmembrane helices F and H of subunit IV”*. 
Another role for chlorophyll in regulating the activity of STN7 could 
involve PQH, displacing the chlorophyll phytyl tail on binding to the 
Q, site; this motion could induce a conformational change in STN7 
leading to its activation”. 

PQ2 binds towards the stromal face of the complex, 4.4 A from the 
haemc,—b, pair at the Q,, reducing site (Fig. 2f). The b, andc, haems on 
each monomer are separated by 4.9 A, with the b, haem coordinated by 
His202 and His100 (cytb,), whereas the vinyl side-chain of haem c, is 
covalently linked to Cys35 (cytb,) (Fig. 2f). The dimerization interface 
of the cytb,f complex creates a cavity, which is proposed to promote 
transfer of quinones between the Q, and Q, sites on neighbouring mon- 
omers® (Fig. 4a, b). It is noteworthy that the three resolved PQ molecules 
inhabit this cavity and that PQ2 assumes a position ‘diagonally’ oppo- 
site to PQI (Fig. 4c) on the other monomer, as previously suggested”’. 
PQ2 adopts a bowed conformation that straddles the intermonomer 
cavity with the distal PQ2 tail appearing to partially obstruct the Q, 
site in the neighbouring monomer (Fig. 4d, e). This arrangement may 
have functional importance in preventing the simultaneous binding 
of PQ molecules at both Q, sites, avoiding competition for electrons 
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membrane; peripheral helices of cytb, and subunit IV are shown in cartoon 
representation for clarity, to show PQ2 straddling the intermonomer cavity 
and sitting between the twoc, haems.e, Close-up of the cavity ind. f, g, The 
head and tail regions of PQ2in relation to thec, haems on both sides of the 
cavity, highlighting the different orientations of the haemc, propionates, and 
the Arg207 and Asp20 side chains. The distances in angstroms between the 
residues and cofactors are labelled. 


and favouring faster turnover of the Q cycle. Rapid provision of two 
electrons for PQ2 bound at a particular Q, site could be facilitated by 
the 15.3 A electron-tunnelling distance between b, haems (Fig. 2c), 
whichenables rapid inter-monomer electron transfer via the ‘bus-bar’ 
model fromthe neighbouring low-potential chain””*’. Alternatively, the 
second electron could be provided to the haemc, directly via an FNR- 
ferredoxin complex bound at the stromal surface via CET’”**. The haem 
C, propionates on the two halves of the cytb, f dimer adopt different 
conformations (Fig. 4f, g); in the PQ-vacant site on the opposing mono- 
mer, the haem c, propionate is more closely associated with Arg207 
(Fig. 4f), whereas in the PQ-occupied site, the haem c, propionate is 
rotated towards the 1,4-benzoquinone ring of PQ2 (Fig. 4g). The altered 
ligation of haem c, on PQ binding is consistent with the downshift of 
its redox potential™, which would strongly favour PQ reduction. We 
note that the reduction and oxidation of haem c, is accompanied by 
the binding and release of one proton” so only one proton is required 
from the stromal side via the Arg207 and Asp20 residues (Fig. 4f, g) 
for PQ2 reduction to proceed rapidly, avoiding SPQ formation. It is 
also possible to position an oppositely oriented PQ within the density 
map, albeit with a less satisfactory fit (Extended Data Fig. 5). A third 
PQ molecule (PQ3) (Fig. 2g) has been assigned to the density between 
the Q, and Q, binding sites (see Extended Data Fig. 6 for an alternative 
assignment as phosphatidylcholine) with the 1,4-benzoquinone ring 
near the channel that links the two sides of the intermonomer cavity 
and the isopreny] tail at the mouth of the Q, site. This third PQ may 
therefore capture a snapshot of the molecule transitioning between 
the Q, and Q, sites in opposite monomers. 

The cryo-EM structure of spinach cytb, f reveals the positions of 
natively bound PQ and provides details regarding the conformational 
switches involved in PQ binding tothe Q, site, chlorophyll gating of the 
Q, site and PQ exchange between the sites during Q-cycle operation. 
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Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized. The investigators were not blinded 
to allocation during experiments and outcome assessment. 


Complex purification 
Dimeric cytb,f was isolated from dark-adapted market spinach (S. oler- 
acea) ina procedure adapted from Dietrich and Kuhlbrandt*. 

In brief, spinach leaves were homogenized in buffer 1 (SO mM Tris- 
HCI pH 7.5, 200 mM sucrose and 100 mM NaCl). Homogenate was then 
filtered and centrifuged for 15 min at 4,540g, 4 °C. Following centrifuga- 
tion, the supernatant containing cell debris was discarded and the pellet 
resuspended in buffer 2 (10 mM Tricine-NaOH pH 8 and 150 mM NaCl) 
before centrifugation again for 15 min (4,540g, 4 °C). The resultant pel- 
let was resuspended in buffer 3 (2 M NaBr, 10 mM Tricine-NaOH pH 8 
and 300 mM sucrose) and incubated on ice for 15 min before diluting 
twofold with ice-cold milliQH,O and centrifuging (15 min, 4,540g, 4 °C). 
The resultant pellet was resuspended in buffer 3 and incubated on ice 
for 15 min before diluting twofold with ice-cold milliQ H,O and cen- 
trifuging again (15 min, 4,540g, 4 °C). The pellet was resuspended in 
buffer 2 and centrifuged for 15 min, 4,540g at 4 °C. The final pellet was 
resuspended ina small volume of buffer 4 (40 mM Tricine pH 8.0, 10 mM 
MgCl, and 10 mM KCI). The resultant thylakoid suspension was adjusted 
to10 mg mI chlorophyll (chlorophyll concentrations determined as 
described previously”). 

For selective solubilization of cytb, f, the thylakoid suspension (10 
mg mI chlorophyll) was diluted with membrane extraction buffer (40 
mM Tricine pH 8.0, 10 mM MgCl,, 10 mM KCl and 1.25% (w/v) Hecameg) 
to a final concentration of 2 mg mI‘ chlorophyll, 1% (w/v) Hecameg. 
The resultant solution was mixed thoroughly then incubated for 2 
min at room temperature before dilution to 0.75% (w/v) Hecameg with 
buffer 4. Unsolubilized material was removed by ultracentrifugation 
at 244,000g at 4 °C for 30 min in a Beckman Ti50.2 rotor. 

The solubilization supernatant was concentrated using a Centriprep 
100K centrifugal filter (Merck Millipore) before loading onto a 10-40% 
(w/v) continuous sucrose gradient containing 40 mM Tricine pH 8, 10 
mM MgCl, 10 mM KCI, 0.8% (w/v) Hecameg and 0.1 mg mI egg yolk 
L-a-phosphatidylcholine (Sigma). This was ultracentrifuged at 174,587g¢ 
at 4 °C for 16 hina Beckman SW32 rotor. 

A brown-ish band containing cytb, f was collected from a region 
of the gradient corresponding to -16% sucrose. This band was con- 
centrated and loaded onto a ceramic hydroxyapatite column (CHT) 
(Type 1, Bio-Rad) pre-equilibrated in 20 mM Hecameg, 0.1 mg ml? 
phosphatidylcholine and 20 mM Tricine pH 8. The column was washed 
with 5 column volumes of CHT wash buffer (20 mM Hecameg, 0.1mg 
ml™ phosphatidylcholine and 100 mM ammonium phosphate pH 8) 
before bound material was eluted with CHT elution buffer (20 mM 
Hecameg, 0.1 mg mI" phosphatidylcholine and 400 mM ammonium 
phosphate pH 8). 


Detergent exchange and gel filtration 

Concentrated CHT eluate was loaded onto a 10-35% (w/v) continu- 
ous sucrose gradient containing 50 mM HEPES pH 8, 20 mM NaCl and 
0.3 mM 4-trans-(4-trans-propylcyclohexyl)-cyclohexyl a-maltoside 
(tPCCaM) and ultracentrifuged at 175,117g at 4 °C for 16 hina Beck- 
man SW41 rotor. 

Asingle brown band containing cytb, f was collected froma region 
of the gradient corresponding to ~22% sucrose. This band was concen- 
trated and loaded onto HiLoad 16/600 Superdex 200 pg gel filtration 
column (GE Healthcare) connected to an AKTA prime plus purification 
system (GE Healthcare). The column was run at a rate of 0.2 ml min™ 
with 145 ml with gel filtration buffer (SO mM HEPES pH 8, 20 mM NaCl, 
0.3 mM tPCCaM). Eluted fractions comprising dimeric cytb, f were 
pooled and concentrated. 


SDS-PAGE and BN-PAGE analysis 

Samples collected from each purification step were analysed by SDS- 
PAGE and BN-PAGE. For SDS-PAGE, precast NuPAGE 12% Bis-Tris gels 
(Invitrogen) were run for 60 min at 180 V before staining with Coomas- 
sie blue. For BN-PAGE, precast NativePAGE 3-12% Bis-Tris gels (Invitro- 
gen) wererun for 120 min at 160 V before staining with Coomassie blue. 
Gels were imaged using an Amersham 600 imager (GE Healthcare). 


Quantification of purified dimeric cytb, fusing redox difference 
spectra 

Absorbance spectra were recorded at room temperature on a 
Cary60 spectrophotometer (Agilent). For redox difference spectra 
cytochromes were first fully oxidized with a few grains of potassium fer- 
ricyanide followed by reduction witha few grains of sodium ascorbate 
(cytf ) then sodium dithionite (cytfand cytb,). At each stage the sam- 
ple was mixed thoroughly and incubated for -1 min before recording 
spectra. Redox difference spectra (ascorbate-reduced minus ferricya- 
nide-oxidized and dithionite-reduced minus ascorbate-reduced) were 
calculated and used to determine the concentrations of c-type haem 
of cytfand the two b-type haems of cytb, using extinction coefficients 
of 25 mM cm?! (haem f) and 21mM cm (cytb, haems)**. 


Reduction of decylplastoquinone 

Approximately 0.1 mg decylplastoquinone (Merck) was dissolved in 
100 pl ethanol, mixed with a few grains of sodium dithionite dissolved 
in 100 pl milliQ H,O and vortexed until the solution became colour- 
less. Decylplastoquinol was extracted by mixing with 0.5 ml hexane, 
vortexing and centrifuging at 16,000g for 2 min. The hexane layer was 
carefully removed ensuring none of the aqueous phase was collected. 
Hexane extraction was repeated onthe aqueous phase twice more, then 
the hexane solutions were pooled and dried in a rotary evaporator at 
30 °C for 1h before re-dissolving in -100 pl DMSO. Decylplastoquinol 
concentration was determined by diluting 10 pl of the DMSO solution 
into 795 pl ethanol, recording the absorbance spectrum between 250 
and 350 nm and using an extinction coefficient® of 3,540 M ‘cm at 
290 nm. 


Purification of PC 

PC was purified in its oxidized form from market spinach. In brief, spin- 
ach leaves were homogenized in buffer containing 50 mM sodium 
phosphate pH 7.4,5 mM MgCl, and 300 mM sucrose. Homogenate was 
then filtered and centrifuged for 15 min at 4,000g. Following centrifu- 
gation, the supernatant containing cell debris was discarded and the 
pellet was resuspended in buffer containing 10 mM Tricine pH 7.4.and5 
mM MgCl... The solution was incubated on ice for 1 min before diluting 
twofold with buffer containing 10 mM Tricine pH 7.4,5 mM MgCl,,400 
mM sucrose and centrifuging for 15 min at 4,000g. Following centrifu- 
gation, the pellet was resuspended to a chlorophyll concentration of 
2 mg mI in buffer containing 10 mM HEPES pH 7.6, 5 mM NaCl and 5 
mMEDTA, and sonicated for 10 min, at 30 sintervals. The solution was 
centrifuged at 200,000g for 1hto pellet any large unbroken material. 
The supernatant was applied to four 5-ml GE Healthcare Hi-TRAP Q FF 
anion-exchange columns in series, equilibrated in HEPES pH 8, 5 mM 
NaCl. A gradient of 0.005-1M NaCl was used for elution, with PC elut- 
ing at around 200 mM. PC-containing fractions were identified by the 
blue colour on addition of potassium ferricyanide. These fractions were 
pooled, concentrated ina Vivaspin 3-kDa molecular-weight cut-off spin 
concentrator and loaded onto a Superdex 200 16/600 FPLC column, 
equilibrated with 20 mM HEPES pH 8 and 20 mM NaCl. The resulting 
PC fractions were pooled, concentrated and frozen at —80 °C until use. 


Activity assays 
Reduction of PC by cytb, fwas monitored by stopped-flow absorb- 
ance spectroscopy using an Olis RSM 1000 rapid-scanning 


spectrophotometer equipped with a USA-SF stopped flow cell at 20 °C. 
Solution A (231.25 nM cytb,f and 62.5 1M PC in50 mM HEPES pH 8, 20 
mM NaCl and 0.3 mM tPCCaM) and solution B (1.25 mM decylplasto- 
quinolinthe same buffer) were prepared and the reaction was initiated 
by mixing the solutions ina 4:1 volumetric ratio (final concentrations: 
185 nM cytb, f, 50 uM PC and 250 uM decylplastoquinol). PC reduc- 
tion was monitored by recording absorbance spectra between 420 
and 750 nmat a rate of 62 scans s ‘and plotting the change in absorb- 
ance® at 597 nm. Ina control reaction, cytb, f was omitted to record 
the uncatalysed reduction of PC by decylplastoquinol. Fitting of the 
initial reaction rates was performed in Origin. All measurements were 
carried out in triplicate. 


CryoEM specimen preparation and data acquisition 

In brief, 3 pl of purified cytb, f(-17 uM) was applied to freshly glow- 
discharged holey carbon grids (Quantifoil R1.2/1.3, 400 mesh Cu). The 
grids were blotted for 2s at 8 °C then plunge frozen into liquid ethane 
using a Leica EM GP at 90% relative humidity. Data acquisition was 
carried out ona Titan Krios microscope operated at 300 kV (Thermo 
Fisher) equipped with an energy filtered (slit width 20 eV) K2 summit 
direct electron detector. A total of 6,035 movies were collected in count- 
ing mode at a nominal magnification of 130,000x (pixel size of 1.065 
A) anda dose of 4.6 e A*s7 (see Extended Data Table 1). An exposure 
time of 12s was used and the resulting movies were dose-fractionated 
into 48 fractions. A defocus range of —1.5 to —2.5 tm was used. 


Image processing and 3D reconstruction 

Beam-induced motion correction and dose-fractionation were carried 
out using MotionCor2. Contrast transfer function (CTF) parameters 
of the dose-weighted motion-corrected images were then estimated 
using GCTF”. All subsequent processing steps were performed using 
RELION 2.1°8 or 3.0” unless otherwise stated. 

In total, 422,660 particles were manually picked from 6,035 micro- 
graphs. These particles were extracted using a box size of 220 x 220 
pixels and subjected to reference-free 2D classification. A typical micro- 
graph showing picked particles is shown in Extended Data Fig. 2a, b. 
Particles that categorized into poorly defined classes were rejected, 
while the remaining 292,242 (69.2%) particles were used for further 
processing. A subset of 30,000 particles was used to generate a de novo 
initial model using the ‘3D initial model’ subroutine. The initial model 
low-pass filtered to 20 A was used as a reference map for subsequent 
3D classification into 10 3D classes. One stable 3D class at a resolution 
of 5.38 A was selected for high-resolution 3D auto-refinement; this 
class accounted for a subset of 108,560 particles (25.6%). This sub- 
set of refined particles was then re-extracted and re-centred before 
another round of 3D auto-refinement was carried out. The resultant 
4.85 A density map was corrected for the modulation transfer function 
(MTF) of the Gatan K2 summit camera then further sharpened using the 
post-processing procedure to 4.02 A. Per-particle CTF-refinement was 
carried out and a soft mask was created which included the detergent 
shell. The final global resolution estimate of 3.58 A was based on the 
gold-standard Fourier shell correlation (FSC) cut-off of 0.143. 

Local resolution was determined using one of two unfiltered half- 
mapsas aninput, acalibrated pixel size of 1.065 and a B-factor of -103. 
The output local resolution map is shown in Extended Data Fig. 2d, e. 


Model building 

Initially, a homology-based approach was performed using the crystal- 
lographic structure of Nostoc sp. PCC 7120 cytb, f (PDB: 40GQ)” asa 
template. Sequence alignments of the eight polypeptide subunits of 
cytb, f were carried out using Clustal Omega (Extended Data Figs. 7, 
8). The model was rigid-body docked into the density using the ‘fit 
in map’ tool in Chimera”. This was then followed by manual adjust- 
ment and real-space refinement using COOT™. Sequence assignment 
and fitting was guided by bulky residues suchas Arg, Trp, Tyr and Phe. 


After fitting of the polypeptide chains and cofactors in one half of the 
dimeric complex, the other half of the complex was then independently 
fitted into the C1 density map. Once both halves of the complex were 
fitted, cofactors, lipids and plastoquinone-9 molecules were fitted 
into regions of unassigned density. The final model underwent global 
refinement and minimization using the real space refinement tool in 
PHENIX™. The final refinement statistics are summarized in Extended 
Data Table 1. 


Pigment analysis by reversed-phase HPLC 

Pigments wereextracted from purified cytb, f with 7:2 acetone:methanol 
(v/v) and clarified extracts were separated by reversed-phase HPLC 
at a flow rate of 1 ml min“ at 40 °C using a Supelco Discovery HS C18 
column (5 um particle size, 120 A pore size, 25cm x 4.6 mm) onan Agi- 
lent 1200 HPLC system. The column was equilibrated in acetonitrile: 
water:trimethylamine (9:1:0.01 v/v/v) and pigments were eluted by 
applying alinear gradient of O-100% ethyl acetate over 15 min followed 
by isocratic elution with 100% ethyl acetate for a further 5 min. Elution 
of carotenoid and chlorophyll species was monitored by absorbance at 
400, 450, 490 and 665 nm. Chlorophyll awas identified by its absorp- 
tion spectraand known retention time“*. The major carotenoid species 
was confirmed as 9-cis B-carotene using a standard obtained from 
Sigma-Aldrich (product no. 52824). 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 


All relevant data are available from the authors and/or are included 
with the manuscript or in the Supplementary Information. Atomic 
coordinates and the cryo-EM density map have been deposited in the 
Protein Data Bank under accession number 6ROQF and the Electron 
Microscopy Data Bank (EMDB) under accession number EMD-4981. 
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Extended Data Fig. 1|See next page for caption. 
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Extended Data Fig. 1| Purification of cytb, f from spinach. a, Absorption 
spectrum of ascorbate-reduced purified b, fcomplex. The peak at 421nm 
corresponds to the Soret band of bound pigments (chlorophyll aand haems). 
The peaks at 554 and 668 nm correspond toc-type haems of cytf and 
chlorophyll a, respectively. The inset panel shows redox difference spectra of 
ascorbate-reduced minus ferricyanide-oxidized b, f (dashed line) and 
dithionite-reduced minus ascorbate-reduced (dotted line) cytb, fi Redox 
difference spectra show haem fabsorption peaks at 523 and 554 nmas well as 
absorption peaks at 534 and 563 nm corresponding to the b-type haems of 
cytb,. The calculated ratio of cytb, b-type haems to the c-type haem of cytf was 
~2 using extinction coefficients of 25 mMcm7(f) and 21mMcm7?(b,)**. The 
spectra exhibit the absorption properties characteristic of intact cytb,f. 
Spectra were recorded at room temperature. b, SDS-PAGE analysis of purified 
cytb,f indicates that the sample is highly pure, with the four large subunits of 
the complex (cytf, cytb,, the Rieske ISP and subunit IV) running at ~31 kDa, 

~24 kDa, -20 kDaand -17 kDa, respectively and the four small subunits (PetG, 
PetL, PetM and PetN) running at around 4 kDa (not shown). c, d, Negative-stain 


and BN-PAGE analysis of purified cytb,f demonstrates the sample is dimeric 
and highly homogenous, witha single band corresponding to dimeric cytb,f 
shown inlane1.Lane2 shows asample that has been deliberately monomerized 
following incubation with 1% Triton X-100 for 1h. For gel source data see 
Supplementary Fig. 1. e, The catalytic rate of plastocyanin reduction by the 
purified dimeric cytb,f complex as determined by stopped-flow absorbance 
spectroscopy. A rate of 200 es“ was determined by taking the initial linear 
region from the enzyme-catalysed reaction (solid line) and subtracting the 
background rate measured in the absence of enzyme (long-dashed line). 
Plastocyanin reduction was not observed in the absence of decylplastoquinol 
(short-dashed line). Reactions were initiated upon addition of 
decylplastoquinol to the solution containing plastocyanin and b,f while 
monitoring the loss of absorbance at 597 nm. Final concentrations were 50 uM 
plastocyanin, 185nm b,f and 250 uM decylplastoquinol. All experiments were 
performed in triplicate and controls were performed inthe absence of b, f or 
decylplastoquinol. 
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Extended Data Fig. 2 | Cryo-EM micrographs of the spinach cytb, fcomplex 
and calculation of the cryo-EM map global and local resolution. a, Cyto, f 
particles covered by a thin layer of vitreous ice on a supported carbon film. 

b, Examples of dimeric cytb,f particles are circledin green. We recorded 6,035 
cryo-EM movies, from which 422,660 particles were picked manually for 
reference-free 2D classification. The final density map was calculated from 
108,560 particles. c, Gold-standard refinement was used for estimation of the 


final map resolution (solid black line). The global resolution of 3.58 Awas 
calculated using a FSC cut-off at 0.143. A model-to-map FSC curve (solid grey 
line) was also calculated. d,e, AC1 density map of the cytb,f complex both with 
(d) and without (e) the detergent shell. The map is coloured according to local 
resolution estimated by RELION and viewed from within the plane of the 
membrane. The colour key onthe right shows the local structural resolution in 
angstroms (A). 
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Extended Data Fig. 3 | Cryo-EM densities and structural models of polypeptides inthe cytb, f complex. Polypeptides are coloured asin Fig. 1. The contour 
levels of the density maps were adjusted to 0.0144. 
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Extended Data Fig. 4 | Cryo-EM densities and structural modelsof prosthetic green), 2Fe-2S (burnt orange and yellow), plastoquinones (yellow), 

groups, lipids and plastoquinone molecules inthe cytb, f complex. c-type monogalactosyl diacylglycerol (light pink), phosphatidylcholine (light cyan), 
haems (f,c,; dark blue), b-type haems (b,, b,; red), 9-cis B-carotene (orange), sulfoquinovosyl diacylglycerol (light green) and phosphatidylglycerol (light 
chlorophyll a (major conformation, dark green; minor conformation, light purple). The contour levels of the density maps were adjusted to 0.0068. 
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Extended Data Fig. 5 | Alternative interpretation of the region assigned as 
PQ2.a,b, The density map showing two possible alternative conformations for 
PQ2, the major conformation (a) and the alternative conformation (b). 


coloured red, c-type haems (c,) coloured dark blue, chlorophyll a (major 
conformation) coloured dark green, plastoquinones coloured yellow and the 


cytb, subunit coloured light green. The contour level of the density map was 
Cofactors are coloured as in Extended Data Fig. 4 with b-type haems (b, and b,) adjusted to 0.0089. 


Extended Data Fig. 6 | Alternative interpretations of the density mapinthe Lys208. Cofactors are coloured asin Extended Data Fig. 4 with b-type haems (b, 


region assigned as PQ3. a, b, The density map modelled witha plastoquinone and b,) coloured red, chlorophyll a (major conformation) in dark green, 
molecule (a) and a phosphatidylcholine molecule (b). Top, the protein-free plastoquinones in yellow, phosphatidylcholine in light cyan, sulfoquinovosyl 
density map; bottom, the map including cytb, (green). The 2.9 A distance diacylglycerol in mint green and thecytb, subunit in light green. The contour 


indicates a close contact between the PQ3 head group and the conserved level of the density map was adjusted to 0.0127. 
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a 


Extended Data Fig. 7 | Multiple sequence alignment of cytb, fsubunits cytf 
and cytb,.a, b, Sequences of cytf(a) and cytb, (b) from cyanobacterial (M. 
laminosus and Nostoc sp. PCC7120), algal (C. reinhardtii) and plant (S. oleracea) 
subunits were aligned in Clustal Omega v.1.2.4. Conserved identities are 
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MSKVYDWFEERLEIQAIADDITSKYVPPHVNIFYCLGGITLTCFLVOVATGFAMTFYYRP 
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TRYYSAHTFVLPWLIAVFMLLHFLMIRKOGISGPL 215 
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indicated by asterisks, and similarities by double or single dots. Polar residues 
are coloured in green, positively charged residues are pink, hydrophobic 
residues are red and negatively charged residues are blue. The sequences omit 


Mastigocladus MAQFTESMDVPDMGRROFMNLLAFGTVTGVALGALYPLVKYFIPPSGGAVGGGTTAKDKL 60 
Nostoc = = — =-=-=---- DVPDMGRROFMNLLTFGTVTGVALGALYPVVNYFIPPAAGGAGGGTTAKDEL 52 
Chlamydomonas ---AAASSEVPDMNKRNIMNLILAGGAGLPITTLALGYGAFFVPPSSGGGGGGQAAKDAL 57 
Spinach -ATS IPADNVPDMQKRETLNLLLLGALSLPTGYMLLPYASFFVPPGGGAGTGGTIAKDAL 59 
peAKK sks pees * a ee xk kk Ok 
Mastigocladus GNNVKVSKFLESHNAGDRVLVOGLKGDPTYIVVESKEAIRDYGINAVCTHLGCVVPWNAA 120 
Nostoc GNDVSVSKFLESHNVGDRTLVOGLKGDPTYIV----- AITDYGINAVCTHLGCVVPWNAA 107 
Chlamydomonas GNDIKAGEWLKTHLAGDRSLSQGLKGDPTYLIVTADSTIEKYGLNAVCTHLGCVVPWVAA 117 
Spinach oeinancas snare ieee irareie th rea icaectaniraalaclete ribbing 19 
KEES CCEEKE ER RRR OK RRR ERK LPR RRR ORK 
Mastigocladus ENKFKCPCHGSQYDETGKVIRGPAPLSLALCHATVQD-DNIVLTPWTETDFRTGEKPWWV 179 
Nostoc ENKFKCPCHGSQYDATGKVVRGPAPKSLALSHAKTEN-DKIVLTSWTETDFRTGEEPWWS 166 
Chlamydomonas ENKFKCPCHGSOQYNAEGKVVRGPAPLSLALAHCDVAESGLVTFSTWTETDFRTGLEPWWA 177 
Spinach aioe psechaetesanadial aneeennahenntmapataidaell GKVVFVPWTETDFRTGEAPWWS 178 
RR RRR: OK RK RRR Ok, KERRKERKK kK 
Mastigocladus - 179 
Nostoc - 166 
Chlamydomonas - 177 
Spinach 179 
Chlamydomonas MSVTKKPDLSDPVLKAKLAKGMGHNTYGEPAWPNDLLYMFPVVILGTFACVIGLSVLDPA 60 
Spinach MGVTKKPDLNDPVLRAKLAKGMGHNYYGEPAWPNDLLYIFPVVILGTIACNVGLAVLEPS 60 
Mastigocladus MATLKKPDLSDPKLRAKLAKGMGHNYYGEPAWPNDLLYVFPVVIMGTFACIVALSVLDPA 60 
Nostoc MATHKKPDLSDPTLRAKLAKGMGHNYYGEPAWPNDLLYVFPIVIMGSFACIVALAVLDPA 60 
ee. KKKRKK KK Kp KKK KKKKK KK KEKKKKKKKKKK SKK SKE SKS SKK 2.¥ sees 
Chlamydomonas AMGEPANPFATPLEILPEWYFYPVFQILRVVPNKLLGVLLMAAVPAGLITVPFIESINKF 120 
Spinach MIGEPADPFATPLEILPEWYFFPVFOQILRTVPNKLLGVLLMASVPAGLLTVPFLENVNKF 120 
Mastigocladus MVGEPADPFATPLEILPEWYLYPVFQILRSVPNKLLGVLLMASVPLGLILVPFIENVNKF 120 
Nostoc MTGEPANPFATPLEILPEWYLYPVFOILRSLPNKLLGVLAMASVPLGLILVPFIENVNKF 120 
KRREEK SRE KEKEKEKKEKEEKKEK 5 RR KKKEE rRRKKKREEE KR SKK Rs RRR S KL SRK 
Chlamydomonas QNPYRRPIATILFLLGTLVAVWLGIGSTFPIDISLTLGLF 160 
Spinach QNPFRRPVATTVFLVGTVVALWLGIGATLPIDKSLTLGLF 160 
Mastigocladus ONPFRRPVATTIFLFGTLVTIWLGIGATFPLDKTLTLGLF 160 
Nostoc QNPFRRPVATTVFLFGTLVTLWLGIGAALPLDKSLTLGLF 160 
KKK KKK SKK PERL RK SR RKKKK S See EK PK KKKEK 
Mastigocladus MVEPLLDGLVLGLVFATLGGLFYAAYOQOYKRPNELGG 37 
Nostoc MVEPLLSGIVLGLIVVTLAGLFYAAYKOYKRPNELGG 37 
Chlamydomonas MVEPLLCGIVLGLVPVTIAGLFVTAYLOYLRGDLATY Be 
Spinach MIEVFLFGIVLGLIPITLAGLFVTAYLOYRRGDOLDL 37 
ee eee See ee eS eee ee 
Mastigocladus =) ---------- MILGAVFYIVFIALFFGIAVGIIFAIKSIKLI- 32 
Nostoc MLAIVAYIGFLALFTGIAAGLLFGLRSAKIL- 31 
Chlamydomonas MIFDFNYIHIFMLTITSYVGLLIGALVFTLGIYLGLLKVVKLI 43 
Spinach = — ----------- MFTLTSYFGFLLAALTITSALFIGLNKIRLI- 31 
soy * rr eae ‘ 
Mastigocladus ~MTEEMLYAALLSFGLIFVGWGLGVLLLKIQGAEKE---— 35 
Nostoc -~MSGELLNAALLSFGLIFVGWALGALLLKIQGAEE---- 34 
Chlamydomonas GEAEF IAGTALTMVGMTLVGLAIGFVLLRVESLVEEGKI 39 
Spinach NAAAEIFRIAAVMNGLTLVGVAIGFVLLRIEATVEE--- 36 
. * * 3 :* * ay * : * *s 
Mastigocladus = ------ MEIDVLGWVALLVVFTWSIAMVVWGRNGL 29 
Nostoc so we eea= MAILTLGWVSLLVVFTWSIAMVVWGRNGL 29 
Chlamydomonas MLAEGEPAIVQIGWAATCVMFSFSLSLVVWGRSGL 35 
Spinach = = — ------ MDIVSLAWAALMVVFTFSLSLVVWGRSGL 29 


* r.*at ee. es 


Extended Data Fig. 8 | Multiple sequence alignment of the Rieske ISP, 
subunit IV, PetG, PetL, PetM and PetN. a-f, Sequences of Rieske ISP (a), 
subunit IV (b), PetG (c), PetL (d), PetM (e) and PetN (f) from cyanobacterial (M. 
laminosus and Nostoc sp. PCC7120), algal (C. reinhardtii) and plant (S. oleracea) 
subunits were aligned in Clustal Omega v.1.2.4. Conserved identities are 


indicated by asterisks, and similarities by double or single dots. Polar residues 
are coloured in green, positively charged residues are pink, hydrophobic 
residues are red and negatively charged residues are blue. The sequences omit 
signal peptides. 
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Extended Data Table 1| Cryo-EM data collection, refinement and validation statistics 


Cryo-EM data collection, refinement and validation statistics 


S. oleracea cytbef 


(EMD-4981) 
(PDB 6RQF) 
Data collection and processing 
Magnification 130,000 X 
Voltage (kV) 300 


Electron exposure (e7/A’) 
Defocus range (um) 
Pixel size (A) 
Symmetry imposed 
Initial particle images (no.) 
Final particle images (no.) 
Map resolution (A) 

FSC threshold 
Map resolution range (A) 


Refinement 
Initial model used (PDB code) 
Model resolution (A) 
FSC threshold 
Model resolution range (A) 
Map sharpening B factor (A’) 
Model composition 
Non-hydrogen atoms 
Protein residues 
Ligands 
B factors (A’) 
Protein 
Ligand 
R.m.s. deviations (PHENIX) 
Bond lengths (A) 
Bond angles (°) 
Validation 
MolProbity score 
Clashscore 
Poor rotamers (%) 
Ramachandran plot 
Favored (%) 
Allowed (%) 
Disallowed (%) 


1.15 (55.2 e- on 48 frames) 
-1.5 to -2.5 


RELION de novo model from 30,000 particles 


3.58 
0.143 
~3.3-8.7 


Estimated automatically using RELION* 


16,359 
1,944 
29 


RELION auto-estimated 
RELION auto-estimated 


*Data from ref. *°. 


Extended Data Table 2 | Comparison of cofactor distances in b,f and bc, dimers from different species 


(PDB 6RQF) (PDB 1Q90) (PDB 2E74) (PDB 40GQ) 

Source S. oleracea C. reinhardtii M. laminosus Nostoc sp. PCC 7120 
Resolution (A) 3.6 3.1 3.0 2.5 
Inhibitors * - TDS (Q,) - - 
Distances: 

ba- ¢n(A) 4.7, 4.7 4.7, 4.7 4.7, 4.7 4.6, 4.6 

ba- bp (A) 12.1, 12.0 12.2, 12.2 12.2, 12.2 12.1, 12.1 

by - bp (A) 15.3 15.1 15.2 15.3 

by - [2Fe-2S] (A) 25.6, 25.5 22.9, 22.9 25,5;25.5 253,253 

[2Fe-2S]-f(A) 25.9, 26.1 27.8, 27.8 26.2, 26.2 26.2, 26.2 

(PDB 1BCC) (distal) | (PDB 3BCC) (proximal) 

Source G. gallus G. gallus 
Resolution (A) 3.2 3.7 
Inhibitors * - STG (Q;), AMY (Qn) 
Distances: 

Da- bp (A) 12.4, 12.4 12.3, 12.3 

bp - bp (A) 14.4 14.5 

by - [2Fe-2S] (A) 30.3, 30.3 23.0, 23.1 

[2Fe-2S] - c: (A) 16.8, 16.8 21.3; 210 


The distances are edge-to-edge (A), for each half of the bef dimer from different species (PDB: 6RQF, 1Q90, 2E74, 40GQ) and the be, dimer from Gallus gallus with the Rieske ISP in its distal (PDB: 
1BCC) and proximal (PDB: 3BCC) positions. AMY, antimycin; STG, stigmatellin. 
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Transposons have had a pivotal role in genome evolution’ and are believed to be the 
evolutionary progenitors of the RAGI-RAG2 recombinase’, an essential component 


of the adaptive immune system in jawed vertebrates’. Here we report one crystal 
structure and five cryo-electron microscopy structures of Transib**, a RAG1-like 
transposase from Helicoverpa zea, that capture the entire transposition process from 
the apo enzyme to the terminal strand transfer complex with transposon ends 
covalently joined to target DNA, at resolutions of 3.0-4.6 A. These structures reveal a 
butterfly-shaped complex that undergoes two cycles of marked conformational 
changes in which the ‘wings’ of the transposase unfurl to bind substrate DNA, close to 
execute cleavage, open to release the flanking DNA and close again to capture and 
attack target DNA. Transib possesses unique structural elements that compensate for 
the absence of a RAG2 partner, including a loop that interacts with the transposition 
target site and an accordion-like C-terminal tail that elongates and contracts to help 
to control the opening and closing of the enzyme and assembly of the active site. Our 
findings reveal the detailed reaction pathway of a eukaryotic cut-and-paste 
transposase and illuminate some of the earliest steps in the evolution of the RAG 


recombinase. 


Transposons are present in all kingdoms of life and move within or 
between genomes using transposon-encoded transposases®. Many 
DNA transposases and retroviral integrases contain a conserved RNase 
H-like (RNH) domain that uses three acidic residues (the DDE/D motif) 
to coordinate magnesium and catalyse DNA cleavage and integration’. 
The RAG1-RAG2 recombinase (RAG), which shares this RNase H cata- 
lytic domain®, generates DNA double-strand breaks at recombination 
signal sequences (RSSs) to initiate V(D)J recombination in develop- 
ing lymphocytes of jawed vertebrates*”. The RAGI catalytic core and 
RSSs are thought to have evolved from the transposase and terminal 
inverted repeats (TIRs), respectively, ofan ancient Transibtransposon”. 
Acquisition of a RAG2-like gene by a Transib element is proposed to 
have generated a ‘RAG transposon’ that subsequently had a key role 
in the evolution of RAGI-RAG2 loci and V(D)J recombination’. Unlike 
cut-and-paste transposition, whichis an excision-and-integration reac- 
tion, V(D)J recombination is an excision-and-end joining reaction that 
rejoins the ends of the excised segment to protect the genome against 
hazardous insertions (Fig. 1a). Thus, RAG has been subject to different 
evolutionary constraints to its transposase ancestors, particularly in 
the events that occur after DNA cleavage. 

Transib from H. zea (hereafter designated Transib unless otherwise 
specified) is an active transposon with a TIR that resembles a portion 
of the RSS* (Fig. 1b) and a transposase (Transib protein) that cleaves 
DNA using a nick-hairpin mechanism similar to that of RAG and the 
hAT family transposase Hermes*” (Fig. 1a). Transib*, Hermes” and 
RAG" “ are active in vitro for the subsequent strand-transfer reaction 


that completes transposition; however, for RAG, this step is strongly 
suppressed in vivo’. 

Recent advances in RAG structural biology have clarified the molecu- 
lar basis for RSS recognition and cleavage** ”. However, transposition 
mediated by DDE/D family enzymes that proceed via hairpinning is 
less well understood, particularly at the final step of integration into 
target DNA. In contrast to the availability of structures capturing the 
strand-transfer complexes of bacteriophage Mu’ and retroviral inte- 
grases””’, transposon integration has been visualized structurally for 
only one eukaryotic DNA transposase, Mos1”?™, which has a catalytic 
mechanism that does not involve a hairpin intermediate”. As the only 
known active Transib transposase, H. zea Transib provides a unique 
opportunity for the analysis of a RAG2-independent RAGI1-family 
protein and for comparative insights into the effect of RAG2 on RAG1 
function and RAG evolution. 

Here we describe near-atomic resolution crystal and cryo-electron 
microscopy (cryo-EM) structures of H. zea Transib in the apo form and 
complexed with intact TIR substrate, nicked TIR substrate, cleaved 
transposon ends and transposon ends covalently joined to target DNA 
(Extended Data Figs. 1-4, Extended Data Tables 1, 2). An additional 
complex, with Transib bound to transposon ends and target DNA 
before strand transfer, was also observed in and modelled from the 
cryo-EM data. These structures represent, to our knowledge, the most 
complete structural description to date of a eukaryotic cut-and-paste 
transposition reaction, explain the target site sequence preferences of 
RAG-family transposases and reveal the conformational changes that 
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Fig. 1| Functional and structural overview of H. zea Transib. a, Schematic of 
DNA recombination and transposition pathways of RAG and Transib. RSS or TIR 
are shownas triangles with wide side indicating heptamer sequence. TSD, 
target site duplication. b, Sequence and numbering of the Transib TIR 
substrate. Heptamer and nonamer sequences of TIRand RSS are shown in red. 
TS, transferred strand. NTS, non-transferred strand. The nicking site on Transib 
TIRis between T-1Land ClonNTS.c, Domain organization of H. zea Transib 
compared with mouse RAGI. Domain boundaries are shown by residue 
number. Active site carboxylates are labelled in red. d, Front and top view of the 
apo Transib dimer crystal structure. 


enable the same catalytic centre to perform both transposon excision 
and integration. 


Opening of Transib upon TIR engagement 


Apo Transib exhibits a modular domain arrangement similar to that 
of RAGI® (Fig. 1c, d). The N-terminal dimerization and DNA-binding 
domain (DDBD) serves as the dimerization interface and is connected 
by an extended pre-RNase H (PreR) loop to asplit RNH domain contain- 
ing three conserved catalytic carboxylates’ (D125, D224 and E435), all 
of which are required for activity° (Extended Data Fig. Ic, d). E435 is 
separated from the rest of RNH by two zinc-binding domains, ZnC, and 
ZnH, (collectively, ZnB), which forma C,H, zinc finger (Extended Data 
Fig. 5a), as in RAG1®. The following C-terminal domain (CTD) folds back 
to interact with DDBD, and the protein ends with a C-terminal tail (CTT) 
of about 30 amino acids, made up of three short helices that bridge 
from DDBD to ZnB. The absence of anonamer-binding domain (NBD) 
(Extended Data Fig. 5b) is consistent with the observation that Transib 
TIRs have sequence similarity to the heptamer but not the nonamer of 
the RSS*”¢ (Fig. 1b). 

Despite low (16.4%) sequence identity between the Transib and RAG1 
core, individual domains from the two proteins are readily superimpos- 
able (Extended Data Fig. 5a), providing support for the model in which 
Transib and RAG1 are evolutionarily related. These alignments also 
reveal several differences between Transib and RAGI, three of which 
(red boxes in Extended Data Fig. 5a) are extended structural elements 
in RAGI, absent from Transib, that together constitute a substantial 
portion of the RAG2-binding interface in RAG1 (Extended Data Fig. 5c). 
These three missing elements explain the absence of aRAG2-like entity 
in Transib and poor RAG2 binding by Transib in vitro*”®. 


@ppBD OPreR 
@RNH @z-B 
@ct @cr 


Cis ZnB 


Fig. 2| Structures of Transib-TIR complexes during transposon binding and 
excision. a, Overall cryo-EM structure of Transib PRC withintact TIR 
substrates. Two Transib subunits are coloured in orange and purple. b, 7rans 
architecture of Transib-TIR complex. DDBD and CTD from trans Transib are in 
pale shades. Mg” ion, green sphere; catalytic carboxylates, red sticks; scissile 
phosphate is highlighted in yellow. c, Overall cryo-EM structure of Transib HFC 
with nicked TIR substrates. d, Comparison of TIR substrates from PRC and HFC. 
e, Overall cryo-EM structure of Transib TEC with catalytically cleaved 
transposonend DNAs. f, Transposon end nucleotides are stabilized by ZnB 
domain residues, but the 3’-OH is not coordinated for the subsequent strand 
transfer reaction. 


Binding of intact TIR substrate to form the pre-reaction complex 
(PRC) induces a marked relocation of the ZnB domains, from being 
tightly packed components of the enzyme core to lateral extensions 
that jut away from the core (Fig. 2a, Extended Data Fig. 6a and Sup- 
plementary Video 1). This 49° rotation and 26 A centroid movement 
of ZnB exposes the TIR-binding grooves (Extended Data Fig. 6a) and 
is twice as large as the RAGI ZnB-domain movement that occurs on 
intact RSS binding”’. Viewed from the front, the Transib PRC resembles 
a butterfly with wings spread and DNA as antennae, with ZnB domain 
rotation constituting an ‘unfurling’ of the wings, one fromthe back and 
the other from the front of the butterfly (Fig. 2a, Extended Data Fig. 6a). 

The Transib PRC adopts a trans architecture in which each TIR 
engages the active site of one Transib (the cis subunit) but is bound 
primarily by the other Transib (the trans subunit) (Fig. 2b), similar to 
RAG and other DDE transposases and retroviral integrases”>'*. CTT 
from the cis subunit tracks through the heptamer major groove and 
interacts with the backbone of TIR position 3 (Extended Data Fig. 6b). 
Trans DNA binding interactions include base-specific interactions 
between CTD and TIR positions 5-7 and between DDBD and the phos- 
phate backbone at TIR positions 8-13 (Extended Data Fig. 6c, d). No 
interaction is observed beyond position 13, and consistent with this, 


Nature | Vol575 | 21November 2019 | 541 


Article 


Frequency 


oO 


f 


@ Target DNA 
@ RNH @ ZnB 
@ctD @1s 


Deuterostome 
invertebrate 


y 


Fig. 3| Transposon end integration and strand transfer complex. 

a, Schematic of strand transfer product. Heptamer and target site sequences 
are coloured red and green, respectively. b, Overall cryo-EM structure of 
Transib in complex with naturally generated strand transfer product. 

c, Different conformations of a9-a10 target site-binding loop inSTC and HFC. 
d, Interactions between the 19-a10 loop andthe 5-bp target site. Hydrogen 
bonds are shownas dashed lines. e, Sequence logo representing nucleotide 
frequencies at Transib TIR-integration sites. f, Active site of the TCC model. 
Mg”* ions, green spheres. Nucleotide residues in target DNA are indicated with 


serial 3’ truncations of the TIR demonstrate that cleavage in vitro is 
robust with only the first 13 bp of the TIR (Extended Data Fig. le). 

The PRC is a cleavage-incompetent complex in which the scissile 
phosphate for nicking is far from the active site and E435 is not posi- 
tioned for catalysis (Extended Data Fig. 6e). This indicates that a sub- 
stantial structural alteration will be required before nicking of the NTS 
could take place. 


Transib closure accompanies catalysis 


Incubation of Transib with nicked TIR substrate at 30 °C in Ca” yieldeda 
complex that is poised for hairpin formation, referred to as the hairpin- 
forming complex (HFC). The HFC is more compact than the PRC, with 
the ZnB domains having undergone a major 51° inward rotation along 
an axis nearly perpendicular to that of the original outward move- 
ment (Fig. 2c, Extended Data Fig. 6g and Supplementary Video 1). 
This inward folding of the ZnB wings is accompanied by several other 
changes in the complex. First, flanking DNA is rotated about 180° and 
tilted around 30° towards the cis ZnB domain, with bases Cl and A-1* 
becoming flipped out of the helix (Fig. 2d, Extended Data Fig. 6h-j).A 
similar DNA rotation is seen in RAG-nicked RSS structures®”*. Second, 
an approximately 6 A movement of E435 has led to full assembly of 
the active site (Extended Data Fig. 6k). Third, HFC formation results 
in numerous new cis Transib-DNA contacts. The first 3 bp of the hep- 
tamer make extensive base-specific interactions with helices «10 and 
«16 of the cis subunit (Extended Data Fig. 6h) and the extrahelical C1 
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subscript T. g, Sequence alignment of Transib transposases, vertebrate RAG1 
and deuterostome invertebrate RAGIL proteins. Residue numbers and 
secondary structure annotation are for H. zea Transib. The 19-10 loop in 

H. zea Transib is highlighted in green. Hs, Homo sapiens (human); Mm, Mus 
musculus (mouse); Dr, Danio rerio (zebrafish); Gg, Gallus gallus (chicken); Bb, 
Branchiostoma belcheri(amphioxus); Sp, Strongylocentrotus purpuratus 
(purple sea urchin); Pf, Ptychodera flava (acorn worm); Pm, Petromyzon 
marinus (sealamprey); and Af, Asterias forbesi (sea star). 


base is buried in a pocket formed by helices «10 and a12 and a loop of 
CTT (Extended Data Fig. 6i). ZnB enfolds the flanking DNA (Fig. 2c) and 
interacts with the first 7 bp of flanking DNA; in the PRC, such interac- 
tions extended only to position —4 (Extended Data Fig. 61, m). Owing 
toits lack of aRAG2 subunit, interactions of Transib with flanking DNA 
are much less extensive than for RAG, in which RAG2 contacts extend 
to position -15 in the PRC and HFC’. 


Transib reopens upon DNA cleavage 


The Transib transposon end complex (TEC) structure, in which hair- 
pin formation and release of flanking DNA has occurred, provides a 
view of post-cleavage events for hAT/RAG family enzymes. Release of 
the flanking DNA hairpin ends is associated with a 26° rotation of the 
ZnB domains that partially spreads the wings of the complex (Fig. 2e, 
Extended Data Fig. 6n and Supplementary Video 1). Clofthe heptamer 
is switched from its flipped-out position to base pair with G1*, and trans- 
poson end DNA has become largely superimposable with that in the 
PRC (Extended Data Fig. 60). In the absence of flanking DNA, the ZnB 
domains are able to tilt and interact with the exposed heptamer ends, 
physically sequestering them through interactions involving N322, 
R343 and K350 (Fig. 2f). The 3’-OH that will be the nucleophile for the 
target-integration reaction is not in close proximity to the three active 
site carboxylates (Fig. 2f), indicating that substantial distortions of the 
transposon end and conformational changes in Transib will be neces- 
sary for the strand transfer reaction. 


¢ Outward rotation of ZnB 
e TIR engagement 


e Inward rotation of ZnB 
¢ DNA rotation and tilt 
e Active-site assembly 


¢ Inward rotation of ZnB 
¢ Active-site reassembly 
¢ Target DNA capture 
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020 020 020 020 020 
a18 18 a18 
Apo CTT PRC CTT HFC CTT TEC CTT STC CTT 


Fig. 4| Transib CTT conformational changes during transposition. a, b, Side-by-side comparison of five H. zea Transib structures, with ZnB and CTT domains 


coloured in green and red, respectively. 


The TEC structure illustrates how Transib prepares for target cap- 
ture and reveals several structural differences from RAG and other 
transposases. The substantial outward rotation of ZnB seen inthe TEC 
exposes the DNA-binding groove and probably facilitates flanking 
DNA release and target capture. No such movement is seen for RAG 
or Hermes”. In addition, the interactions of Transib with the cleaved 
transposon ends might shield the DNA from DNA repair enzymes and 
inhibit end joining. Similar interactions with the cleaved RSS ends are 
not seen inthe RAG signal end complex (SEC)*, a difference that might 
reflect the different evolutionary constraints faced by Transib and 
RAG. Finally, dislocation of the 3’-OH nucleophile out of the Transib 
active site in the TEC is not observed in the RAG SEC or TEC of other 
transposases?537, 


Target DNA capture and strand transfer 


During transposition, hairpin formation and flanking DNA release are fol- 
lowed by non-covalent capture of target DNA to form the target-capture 
complex (TCC) and then by the strand transfer reaction that covalently 
joins the transposon ends to target DNA to form the strand transfer com- 
plex (STC). The Transib TCC and STC were formed through cleavage of 
intact TIR substrates without the provision of aspecific target DNA. One 
3D class of Transib-TIR complexes contained a clearly resolved density 
connecting the catalytic centres of the two Transib subunits (Extended 
Data Fig. 7a), which was determined (see Methods) to be the 5-bp tar- 
get site generated after attack of the transposon ends at a 5’-CGGTG-3’ 
sequence in an additional TIR substrate molecule (Fig. 3a). 

The STC structure reveals that engagement of target DNA triggers 
active site reassembly driven by rotational closure of the ZnB domains, 
which nowenfold target DNA in much the same manner that they previ- 
ously bound flanking DNA in the HFC (Fig. 3b, Extended Data Fig. 7b 
and Supplementary Video 1). The ~19-a10 loop has moved downward 
towards the RNH domain (Fig. 3c) and interacts extensively with tar- 
get site DNA (Fig. 3d). Target site DNA exhibits sharp (approximately 
75°) bends 1 bp from each end, resulting in an overall directional change 
of about 150° (Fig. 3b, Extended Data Fig. 7c). V328 fills the gaps left by 
the breaks in base-stacking on the continuous strands, stabilizing the 
highly kinked DNA conformation (Fig. 3d). 

Transib exhibits a 5’-CGNCG-3’ transposition target site consensus 
sequence and target sites almost always contain a 5’-YR-3’ dinucleo- 
tide step at one or both ends (Fig. 3e and Supplementary Table 1). 


This preference is probably because of the inherent deformability 
and reduced base-stacking of a pyrimidine-purine step”. Notably, the 
GC-rich Transib target site DNA remains fully base-paired in the STC 
despite its highly distorted duplex structure. 

Atrifurcation of density observed at the transposon end-target DNA 
junction suggests that the cryo-EM map represents a mixture of Transib 
in complex with target DNA before (TCC) and after (STC) transposon 
end integration (Extended Data Fig. 7d). Indeed, calculation of the dif- 
ference map between the cryo-EM reconstruction and the STC model 
suggested that a proportion of the particles contain uncleaved target 
DNA (Extended Data Fig. 7e) and enabled modelling of intact target 
DNA inthe cryo-EM density. In this TCC model, the active site captures 
two Mg”' ions (Fig. 3f), whereas none are observed in the disassembled 
active site of the TEC (Fig. 2f). One non-bridging oxygen is hydrogen 
bonded with H274 (Fig. 3f). This histidine, which is conserved in several 
eukaryotic transposase superfamilies”’*°, has been proposed to bea 
key component of a DDHE/D (as opposed to DDE/D) enzyme active 
site*°, and our data are consistent with this proposal. The distances 
separating the scissile phosphate and the attacking oxygen and the 
two metal ions in the TCC model strongly suggest that the active site 
could catalyse the strand-transfer reaction (Extended Data Fig. 7f). 


CTT helps to drive Transib domain closure 


During the two cycles of Transib opening and closing, CTT acts as an 
accordion-like element that extends and refolds in concert with the 
unfurling and furling of the wings of Transib (Fig. 4a, b and Supple- 
mentary Video 1). In apo Transib, CTT is a compact bundle of three 
short helices, 18-020 (Fig. 4b). «20 is anchored through interac- 
tions with helices «12 and «13 of ZnB and stays almost static relative 
to ZnB throughout the transposition cycle (Extended Data Fig. 8a, 
b). By contrast, helices «18 and «19 markedly alter their secondary 
structures during the structural rearrangements of Transib. The large 
rotation of ZnB that accompanies binding of intact TIR DNA substan- 
tially elongates and deforms helices «18 and «19 (Fig. 4a, b). This CTT 
coil might help to drive the inward movement of ZnB and closure of 
the Transib dimer in the subsequent PRC-to-HFC transition, during 
which helix «18 reforms (Fig. 4a, b). Helix 018 becomes deformed again 
during Transib opening and flanking DNA release in the TEC and then 
reforms during Transib closure and target DNA engagement in the 
STC (Fig. 4a, b). Helix «18 is particularly well conserved across Transib 
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proteins, and the hydrophobic residues that anchor helix «20 to ZnB 
also exhibit sequence conservation (Extended Data Fig. 8e). Hence, 
CTT is probably an ancient and functionally conserved component 
of many Transib transposases, and its deletion from Transib almost 
abolished DNA cleavage activity (Extended Data Fig. 8c). 

The C-terminal tails of jawed vertebrate RAGI and invertebrate 
RAGI-like (RAGIL) proteins, including the Branchiostoma belcheri 
RAGIL subunit of the ProtoRAG transposase from amphioxus”, 
show sequence similarity only with helix «18 of H. zea Transib CTT 
(Extended Data Fig. 8e) and are unlikely to perform functions similar 
tothat of CTT. The RAGI C-terminal tail is dispensable for activity, and 
the functionally important B. belcheri RAGIL C-terminal tail interacts 
with TIR DNA downstream of the heptamer and not with ZnB, and 
shares no structural similarity with H. zea Transib CTT® (Extended 
Data Fig. 8d). Hence, the CTT module of RAGI family proteins has 
apparently been readily adapted during evolution to address different 
functional imperatives. 


RAG2 acquisition and transposase evolution 


The lack of structural information for Transib has made it difficult to 
explore the structural and functional implications of the acquisition 
of a RAG2-like subunit by RAG1 early in evolution. The absence of RAG2 
is likely to be of particular relevance for the large domain excursions 
that characterize the Transib transposition reaction (Fig. 4a). The out- 
ward rotation of ZnB that accompanies initial DNA binding in the PRC 
provides extensive access to DNA-binding surfaces, thereby helping 
to compensate for the lack of stabilizing RAG2-flanking DNA interac- 
tions>”°. The subsequent domain closure that yields the Transib HFC 
creates ZnB-flanking DNA interactions (Extended Data Fig. 6m) that 
are contributed predominantly by RAG2 in the RAG HFC’. Interdimer 
interactions mediated by RAG2 stabilize the closed configuration of the 
RAG HFC® and their absence might help to explain the need for a unique 
CTT to help to drive inward rotation during Transib HFC formation. 

Perhaps most notably, the a9-a10 target site-interaction loop of H. 
zea Transib (Fig. 3c, d), anearly ubiquitous feature of predicted Transib 
proteins, is absent in RAG1 and invertebrate RAG1-like proteins pre- 
dicted to have a RAG2-like partner (Fig. 3g). By stabilizing target DNA 
in the TCC and STC, the Transib target site-interaction loop probably 
compensates for the absence of stabilizing RAG2—DNA interactions. 
We propose that acquisition of a RAG2-like gene by a Transib trans- 
poson to give rise to the first RAGI-RAG2 transposon’ set in motion 
two linked evolutionary processes in RAGI: acquisition of new RAG2- 
binding interfaces (Extended Data Figs. 5a, 7g) and loss of the target 
site-interaction loop, which was now no longer needed for the stabi- 
lization of target DNA. 

The structure of the H. zea Transib STC reveals distinctive structural 
and mechanistic features of cut-and-paste transposition. The large over- 
alltarget DNA distortion created by the deep binding pocket of Transib 
contrasts with the relatively mild target DNA bend and flat target DNA- 
binding groove in retroviral integrases!” ” (Extended Data Fig. 7c). A 
second distinctive feature of Transib is the large protein conformational 
change that occurs during target DNA capture (Fig. 4a and Supplemen- 
tary Video 1). By contrast, Mos1”>™ and retroviral integrases””!** adopt 
very similar structures before and after target DNA capture. Finally, 
the Transib STC structure helps to explain multiple features of RAG- 
family transposition: the preferred 5-bp target site duplication length, 
GC-rich target sites*’°?”, target site hotspot sequence preferences”* 
and the ability of mismatches and other DNA distortions to stimulate 
transposition by RAG®”°. 


Online content 


Any methods, additional references, Nature Research reporting sum- 
maries, source data, extended data, supplementary information, 


544 | Nature | Vol575 | 21November 2019 


acknowledgements, peer review information; details of author con- 
tributions and competing interests; and statements of data and code 
availability are available at https://doi.org/10.1038/s41586-019-1753-7. 


1. Feschotte, C. & Pritham, E. J. DNA transposons and the evolution of eukaryotic genomes. 
Annu. Rev. Genet. 41, 331-368 (2007). 
2. Carmona, L. M. & Schatz, D. G. New insights into the evolutionary origins of the 
recombination-activating gene proteins and V(D)J recombination. FEBS J. 284, 1590- 
1605 (2017). 
3. Gellert, M. V(D)J recombination: RAG proteins, repair factors, and regulation. Annu. Rev. 
Biochem. 71, 101-132 (2002). 
4. Chen, S. & Li, X. Molecular characterization of the first intact Transib transposon from 
Helicoverpa zea. Gene 408, 51-63 (2008). 
5. | Hencken, C. G., Li, X. & Craig, N. L. Functional characterization of an active Rag-like 
transposase. Nat. Struct. Mol. Biol. 19, 834-836 (2012). 
6. Craig, N. L. in Mobile DNA III (eds Craig, N. L. et al.) 3-39 (ASM Press, 2015). 
7A Montano, S. P. & Rice, P. A. Moving DNA around: DNA transposition and retroviral 
integration. Curr. Opin. Struct. Biol. 21, 370-378 (2011). 
8. im, M. S., Lapkouski, M., Yang, W. & Gellert, M. Crystal structure of the V(D)J 
recombinase RAG1-RAG2. Nature 518, 507-511 (2015). 
9. Schatz, D. G. & Swanson, P. C. V. V(D)J recombination: mechanisms of initiation. Annu. 
Rev. Genet. 45, 167-202 (2011). 
10. Kapitonov, V. V. & Jurka, J. RAG1 core and V(D)J recombination signal sequences were 
derived from Transib transposons. PLoS Biol. 3, e181 (2005). 
11. Zhou, L. et al. Transposition of hAT elements links transposable elements and V(D)J 
recombination. Nature 432, 995-1001 (2004). 
12. Hickman, A.B. et al. Structural basis of hAT transposon end recognition by Hermes, an 
octameric DNA transposase from Musca domestica. Cell 158, 353-367 (2014). 
3. Agrawal, A., Eastman, Q. M. & Schatz, D. G. Transposition mediated by RAG1 and RAG2 
and its implications for the evolution of the immune system. Nature 394, 744-751 (1998). 
4. Hiom, K., Melek, M. & Gellert, M. DNA transposition by the RAG1 and RAG2 proteins: a 
possible source of oncogenic translocations. Cell 94, 463-470 (1998). 
5. Ru,H. et al. Molecular mechanism of V(D)J recombination from synaptic RAG1-RAG2 
complex structures. Cell 163, 1138-1152 (2015). 

6. Kim,M.S. et al. Cracking the DNA code for V(D)J recombination. Mol. Cell 70, 358-370 
(2018). 

7. Ru,H. etal. DNA melting initiates the RAG catalytic pathway. Nat. Struct. Mol. Biol. 25, 
732-742 (2018). 

8. Montano, S. P., Pigli, Y. Z. & Rice, P. A. The p transpososome structure sheds light on DDE 
recombinase evolution. Nature 491, 413-417 (2012). 

9. Maertens, G. N., Hare, S. & Cherepanov, P. The mechanism of retroviral integration from 
X-ray structures of its key intermediates. Nature 468, 326-329 (2010). 

20. Yin, Z. et al. Crystal structure of the Rous sarcoma virus intasome. Nature 530, 362-366 
(2016). 

21. Ballandras-Colas, A. et al. A supramolecular assembly mediates lentiviral DNA 
integration. Science 355, 93-95 (2017). 

22. Passos, D. O. et al. Cryo-EM structures and atomic model of the HIV-1 strand transfer 
complex intasome. Science 355, 89-92 (2017). 

23. Richardson, J. M., Colloms, S. D., Finnegan, D. J. & Walkinshaw, M. D. Molecular 
architecture of the Mos! paired-end complex: the structural basis of DNA transposition in 
a eukaryote. Cell 138, 1096-1108 (2009). 

24. Morris, E.R., Grey, H., McKenzie, G., Jones, A. C. & Richardson, J. M. A bend, flip and trap 
mechanism for transposon integration. eLife 5, 15537 (2016). 

25. Dawson, A. & Finnegan, D. J. Excision of the Drosophila mariner transposon Mos1. 
Comparison with bacterial transposition and V(D)J recombination. Mol. Cell 11, 225-235 
(2003). 

26. Carmona, L. M., Fugmann, S. D. & Schatz, D. G. Collaboration of RAG2 with RAG1-like 
proteins during the evolution of V(D)J recombination. Genes Dev. 30, 909-917 (2016). 

27. Davies, D. R., Goryshin, I. Y., Reznikoff, W. S. & Rayment, |. Three-dimensional structure of 
the Tn5 synaptic complex transposition intermediate. Science 289, 77-85 (2000). 

28. Lanka§g, F., Sponer, J., Langowski, J. & Cheatham, T.E. Ill. DNA basepair step deformability 
inferred from molecular dynamics simulations. Biophys. J. 85, 2872-2883 (2003). 

29. Yuan, Y. W. & Wessler, S. R. The catalytic domain of all eukaryotic cut-and-paste 
transposase superfamilies. Proc. Natl Acad. Sci. USA 108, 7884-7889 (2011). 

30. Hickman, A. B. et al. Structural insights into the mechanism of double strand break 

ormation by Hermes, a hAT family eukaryotic DNA transposase. Nucleic Acids Res. 46, 

0286-10301 (2018). 

31. Yang, W., Lee, J. Y. & Nowotny, M. Making and breaking nucleic acids: two-Mg”*-ion 

catalysis and substrate specificity. Mol. Cell 22, 5-13 (2006). 

32. Huang, S. et al. Discovery of an active RAG transposon illuminates the origins of V(D)J 

recombination. Cell 166, 102-114 (2016). 

33. Zhang, Y. et al. Transposon molecular domestication and the evolution of the RAG 

recombinase. Nature 569, 79-84 (2019). 

34. Hare, S., Gupta, S. S., Valkov, E., Engelman, A. & Cherepanov, P. Retroviral intasome 

assembly and inhibition of DNA strand transfer. Nature 464, 232-236 (2010). 

35. Tsai, C.L., Chatterji, M. & Schatz, D. G. DNA mismatches and GC-rich motifs target 

transposition by the RAG1/RAG2 transposase. Nucleic Acids Res. 31, 6180-6190 (2003). 

36. Lee, G.S., Neiditch, M. B., Sinden, R. R. & Roth, D. B. Targeted transposition by the V(D)J 
recombinase. Mol. Cell. Biol. 22, 2068-2077 (2002). 


Publisher's note Springer Nature remains neutral with regard to jurisdictional claims in 
published maps and institutional affiliations. 


© The Author(s), under exclusive licence to Springer Nature Limited 2019 


Methods 


No statistical methods were used to predetermine sample size. The 
experiments were not randomized, and investigators were not blinded 
to allocation during experiments and outcome assessment. 


Cloning of H. zea Transib transposase and substrates 

The full-length or an N-terminal truncated fragment (residues 17-507) 
of H. zea Transib transposase fused to a C-terminal His, tag or an N- 
terminal maltose-binding protein (MBP) tag were cloned into pFastBacl 
expression vector (ThermoFisher Scientific) between BamHI and Hin- 
dill restriction sites. pB-5’/3’TIR, a derivative of pBR322 containing the 
TIR substrate for ProtoRAG transposases, was described previously”. 
To generate TIR substrate for Transib transposases, the ProtoRAG S’TIR 
and 3’TIR of pB-5’/3’TIR were substituted by the first 51 bp and 50 bp 
of Transib transposon 5’TIR and 3’TIR, respectively, using In-Fusion 
cloning (Clontech). The PCR-amplified and linearized Transib substrate 
contains a Transib 5’TIR and 3’TIR separated by 411 bp between their 
tips, 126 bp of DNA flanking 5’TIR and 276 bp of DNA flanking 3’TIR. The 
whole substrate was depleted of 5’-CAC-3’ sequence instances except 
for those contained in 5’TIR and 3’TIR regions. 


Protein expression and purification 

MBP- or His,-tagged Transib transposase was expressed in Sf9 insect 
cells using the Bac-to-Bac Baculovirus Expression System (Ther- 
moFisher Scientific) according to the manufacturer’s protocol. Cells 
expressing His,-tagged Transib transposase were resuspended in lysis 
buffer (20 mM Tris-HCI, pH7.5,500 mM NaCl, 1mM dithiothreitol (DTT)) 
and lysed by six passes through an Emulsiflex C3 homogenizer (Avestin). 
Cell lysate was cleared by centrifugation at 40,000 r.p.m. (~146,000g) 
using a Type 50.2 Tirotor (Beckman Coulter) for 1hat4 °C and was mixed 
with pre-equilibrated Ni-NTA Agarose resin (Qiagen) for 2 h with con- 
tinual rotation. The resin was loaded onto a gravity flow column, washed 
with 5x column volume (CV) of lysis buffer and protein eluted with 5x 
CV of elution buffer (20 mM Tris-HCl, pH7.5, 200 mM NaCl, 20 mM 
imidazole, 1 mM DTT). The eluate was further purified and buffer 
exchanged using a Superdex 200 Increase 10/300 GL size-exclusion 
chromatography column (GE Healthcare) in 20 mM Tris-HCl, pH 7.5, 
200 mM NaCl and1mM Tris(2-carboxyethyl) phosphine hydrochloride 
(TCEP-HCI). Cells expressing MBP-tagged Transib transposase were 
resuspended in lysis buffer (20 mM Tris-HCl, pH7.5, 500 mM NaCl, 1mM 
DTT) and purified using amylose resin (New England BioLabs) in20 mM 
Tris-HCI, pH7.5, 200 mM NaCl, 1 mM DTT, followed by size-exclusion 
chromatography purification in 20 mM Tris-HCl, pH 7.5,200 mM NaCl 
and 1mM TCEP. Both forms of Transib protein are a dimer in solution 
and show TIR-dependent nuclease activity (Extended Data Fig. 1). 

Mutant Transib proteins with active site mutations or C-terminal 
tail (CTT) truncation (removal of residues 478-507) were fused to an 
MBP tag and purified in the same way as MBP-tagged wild-type Transib 
transposase. 

His,-tagged human HMGBI with C-terminal truncation (residues 
1-165) was expressed in Escherichia coli BL21 (DE3) and purified as 
previously described”. 

Sf9 cells were obtained from Thermo Fisher Scientific. Cells lines 
used were not authenticated or tested for mycoplasma contamination. 


Crystallization and data collection 

Purified His,-tagged Transib transposase was concentrated to ~6.3 
mg ml‘ and used in crystallization screening. Transib crystals were 
grown by sitting-drop vapour diffusion at 20 °C in 100 mM HEPES, 
pH 7.0, 0.7-0.8 M NaH,PO, and 0.75 M KH,PO,. Crystals were cryo- 
protected in crystallization solution supplemented with 17.5% glycerol 
and flash frozen in liquid nitrogen. Heavy atom derivatives of Transib 
crystals were prepared by soaking crystals in cryo-protection solu- 
tion supplemented with 1 M NaBr for 2-5 min, 0.5 M Nal for 2-5 min, 


2.5mMK,OsCl, for 2h, 2.5 mM K,PtCl, for 2h, or 2.5 mM ethylmercury 
thiosalicylate (EMTS) for 2h. Data were collected at 100 K at beamline 
24ID-E and 241D-C of the Advanced Photon Source at Argonne National 
Laboratory. The dataset for the native crystal was collected at 0.9792 A. 
The datasets for Br-, I-, Os-, Pt- and Hg-derivative crystals were collected 
at 0.9197 A, 1.4586 A, 1.1398 A, 1.0718 A and 1.0087 A, respectively. All 
X-ray diffraction data were indexed, integrated and scaled with the 
XDS package” (Extended Data Table 1). 


Crystal structure determination and refinement 

Phases were determined with native crystal dataset and five heavy-atom- 
derivative datasets by multiple isomorphous replacement with anoma- 
lous scattering (MIRAS) method. Heavy atom sites were identified 
using SHELXD* and the structure was determined using AutoSol®”. The 
initial model was built automatically using AutoBuild*° of PHENIX soft- 
ware package and manually rebuilt in COOT*. The model was refined 
in PHENIX*® with non-crystallographic symmetry (NCS) restraints. 
The final structure was refined to 3.0 A with R,,,,, aNd Riec Of 22.0% and 
27.7%, respectively. Due to poor electron densities, residues 17-20, 
235-238, 247-264 and 502-507 were not included in the final model. 
The structure was validated with MolProbity**. 92.98% of residues are 
inthe favoured regions of the Ramachandran plot, 6.47% in additional 
allowed regions, and 0.56% in the disallowed region. 


Transib-TIR complex assembly 

The 24-bp intact TIR substrate was generated by annealing equimolar 
amountsoftwocomplementaryoligonucleotides:5’-CTAGATCTCACGGTG 
GATCGAAAA-3’ and 5’-TTTTCGATCCACCGTG*AGATCTAG-3’ (heptamer 
sequence is underlined and the asterisks indicates a phosphorothio- 
ate bond introduced between the two nucleotide residues). The 32-bp 
intact TIR substrate was generated by annealing equimolar amounts 
of the two oligonucleotides: 5’-GATCTGGCCTAGATCTCACGGTG 
GATCGAAAA-3’ and 5’-TTTTCGATCCACCGTGAGATCTAGGCCAGATC:3’. 
32 bp nicked TIR substrate was generated by annealing equimolar 
amounts of the following three oligonucleotides: 5’°-GATCTGGCCTAGA 
TCT-3’, 5’-CACGGTGGATCGAAAA-3’ and 5’-TTTTCGATCCACCGTGAGA 
TCTAGGCCAGATC-3’ (a phosphorothioate bond was introduced 
between the heptamer and flanking DNA on transferred strand for 
the nicked TIR substrates used in Transib-TIR complex reconstitution 
in the present of Mg”*). To reconstitute the Transib-intact TIR complex, 
purified MBP-tagged Transib was mixed with 24 bp intact TIR substrate 
and HMGB1 in a1:2:2 molar ratio in the presence of Mg” at 4 °C for 1 
h, followed by size-exclusion chromatography purification in 20 mM 
Tris-HCl, pH 7.5,50 mM KCI, 10 mM MgCl, 1mM TCEP. Transib-nicked 
TIR complex was reconstituted by mixing MBP-tagged Transib, 32 bp 
nicked TIR substrate and HMGB1 in a1:2:2 molar ratio in the present of 
Mg” at 4 °C or in the presence of Ca” at 30 °C for 1h, followed by size- 
exclusion chromatography purification. Catalytically active Transib-TIR 
complex was reconstituted by mixing MBP-tagged Transib with 32 bp 
intact TIR substrate and HMGB1 in a1:2:2 molar ratio in the presence of 
Mg”*, and was allowed to react at 30 °C for 50 min before being frozen 
oncryo-EM grids. 


Cryo-EM sample preparation and data acquisition 

Purified Transib-TIR complex (3.5 pl at ~1.2 uM) was applied to freshly 
glow-discharged Quantifoil 300 mesh or 200 mesh holey carbon grids 
with R1.2/1.3 hole pattern (Electron Microscopy Sciences). Grids were 
blotted for 5.5s under 100% humidity and plunge-frozen in liquid nitro- 
gen-cooled liquid ethane using a Vitrobot Mark IV (ThermoFisher Sci- 
entific). Cryo-EM datasets were collected ona Titan Krios G2 electron 
microscope (Yale University) operated at 300 kV equipped with a GIF 
Quantum LS imaging filter (Gatan) and a K2 summit direct electron 
detector (Gatan) in super-resolution mode. The image stacks were col- 
lected at anominal magnification of 130,000, corresponding to 0.525A 
per super-resolution pixel, at a dose rate of 7.0-7.5 e" per physical pixel 
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pers. The total exposure time for each movie was 8s, thus leading toa 
total accumulated dose of 50.8-54.4€ A”, which was fractionated into 
40 frames. All movies were recorded witha defocus ranging from -1.5 
to -2.5 um. The statistics of cryo-EM data acquisition are summarized 
in Extended Data Table 2. 


Image processing 

Dose-fractionated super-resolution movies were binned over 2 x 2 
pixels, yielding a pixel size of 1.05 A, then subjected to motion correc- 
tion and dose-weighting using MotionCorr2“. The non-dose-weighted 
aligned images were used for contract transfer function estimation by 
CTFFIND-4.1.10*°. The dose-weighted images were used for autopicking, 
classification and reconstruction. For Transib-TIR complex datasets, 
roughly 40,000 particles were automatically picked using a Laplacian- 
of-Gaussian blob detection in RELION-3.0“, followed by around of 2D 
classification to generate templates for anew round of autopicking. 
The newly autopicked particles were subjected to multiple rounds of 2D 
classification in RELION-3.0 to remove junk particles. Particles in good 
2D classes were extracted for initial model generation in RELION-3.0. 
The initial model was low-pass filtered to 50 A to serve as a starting 
reference for 3D auto-refinement in RELION-3.0 using all particles in 
good 2D classes. The signal corresponding to MBP regions was then 
subtracted, followed by 3D classification with a mask encompassing 
the Transib transposases dimer plus TIRs DNA region. Good 3D classes 
were selected and iteratively refined to yield high-resolution maps in 
RELION-3.0 with either Cl or C2 symmetry. Toimprove the map quality 
and interpretability of the Transib ZnB domains in Transib-TIR PRC, 
the particles from good 3D class(es) were symmetry-expanded and 
subjected to masked 3D classification with residual signal subtraction 
focusing on the Transib ZnB domain using a previously published proce- 
dure”. All refinements followed the gold-standard procedure, in which 
two half datasets were refined independently. The overall resolution 
was estimated based on the Fourier shell correlation (FSC) cut-off at 
0.143 between the two half-maps, after a soft mask was applied to mask 
out solvent region. The final maps were sharpened within RELION-3.0. 
Local resolution variation was estimated from the two half-maps using 
ResMap*. 


Cryo-EM model building and refinement 

The crystal structure of Transib dimer was rigid-fitted into the Transib- 
TIRcomplexes cryo-EM maps in UCSF Chimera”. Owing to large domain 
movements, the Transib ZnB domains were fitted separately fromthe 
other part of the structural model. The DNA fragments corresponding 
to heptamer plus the first 16 bp of coding flank from RAG-RSS PRC (PDB 
6CIK) or HFC (PDB 5ZEO) structures were first fitted into the Transib 
PRC or HFC cryo-EM map, respectively, and mutated to the input TIR 
sequence in COOT. The complex resulting from incubation of Transib 
with nicked TIR substrate at 4 °C in the presence of Mg” adopted acata- 
lytically incompetent conformation very similar to that of the Transib- 
intact TIR complex (Extended Data Fig. 6f). This complex is referred to 
as the PRC with nicked TIRs. For Transib STC structure, the modelling 
and sequence registers of the target DNA are based on the following 
observations. (1) The well-defined cryo-EM density for the target site 
suggests a5’-YRRYR-3’ motif (Y, pyrimidine; R, purine) (Extended Data 
Fig. 7a), 5’-CGGTG-3’ is the only match throughout the entire sequence 
of the input TIR substrate DNA. (2) Transib prefers GC-rich target site 
for integration. In vitro transposition has shown that HzTransib can 
mediate transposon integration at target sites with an exact 5’-CGGTG- 
3’ sequence (Supplementary Table 1). (3) Reconstruction of HzTransib 
STC cryo-EM map without imposing C2 symmetry shows asymmetric 
DNA helix density at two flanking DNA-binding regions inthe HzTransib 
dimer. The cryo-EM density for two flanking site DNA helices exhibits 
a 7-9 bp difference in length, which coincides with 18-bp and 9-bp 
flanking DNA on two sides of the 5’-CGGTG-3’ sequence in our TIR 
DNA substrate. By contrast, reconstructing the cryo-EM map of other 


HzTransib-TIR complexes without C2 symmetry results in a map with 
nearly perfect two-fold symmetry. (4) The sequence registers for target 
DNA in this model is also largely supported by the cryo-EM density 
features. The structural models were manually adjusted and rebuilt in 
COOT and refined using PHENIX real-space refinement with secondary 
structure restraints, rotamer restraints, Ramachandran restraints and 
NCS constraints (except for HzTransib-TIR STC, in which no NCS was 
applied). The final structures were validated with MolProbity. The final 
HFC and STC structures contain amino acid residues 21-500 of HzTran- 
sib and most TIR DNA nucleotides, except for the two most distal base 
pairs of the transposon end-flanking DNA and 5’ end of target DNA. In 
PRC structures, residues 17-20, 131-141 and 245-252 of HzTransib and 
the most distal base pair of the transposon end-flanking DNA are not 
modelled owing to poor density. The TEC model contains all 16 bp of 
transposon end DNA. HzTransib residues 17-20, 136-141 and 245-265 
are disordered and are not included inthe final TEC model. No HMGB1 
density was seen in any of the cryo-EM density maps, and thus was not 
included inthe cryo-EM atomic models. All molecular representations 
were generated in UCSF Chimera and UCSF ChimeraX™. Sequence 
alignments were performed in Clustal Omega™ and displayed using 
the online server of Espript 3.0”. 


In vitro DNA cleavage assay 

Linear substrate DNA used in the cleavage experiments was gener- 
ated by PCR using pBR322-based vectors as template and purified by 
agarose gel electrophoresis. Wild-type or mutant HzTransib (300 nM 
final concentration), substrate DNA (final concentration 30 nM) were 
incubated in reaction buffer (25 mM MOPS, pH7.0, 50 mM KCI, 2 mM 
DTT,5mMMgCl,; 16 pl final reaction volume) at 30 °C for 1h. Reactions 
were stopped by adding 1.25 pl 2.5% SDS, 5 pl proteinase K (200 pg mI) 
and 2 p11 0.5 MEDTA followed by incubation at 55 °C for 3h. Samples were 
briefly centrifuged and the supernatant mixed with 6 pl 5x high-density 
Tris-borate-EDTA (TBE) sample buffer (ThermoFisher Scientific) and 
loaded ona non-denaturing 1x TBE-buffered polyacrylamide gel (Bio- 
Rad or ThermoFisher Scientific). After 35 min electrophoresis at 160 V, 
gels were stained with SYBR gold (ThermoFisher Scientific) in 1x TBE 
buffer for 1h and imaged using a PharosFX Plus (Bio-Rad). 


In vitro transposition assay 

Linear donor DNA with tetracycline-resistant gene was amplified by 
PCR using the pBR322-based vector as template and purified by aga- 
rose gel electrophoresis. 0.05 pmol donor DNA and 0.1 pmol pECFP-1 
target plasmid were mixed with 150 ng wild-type HzTransib protein 
in reaction buffer (25 mM MOPS, pH7.0, 50 mM KCI, 2mM DTT, 5 mM 
MgCl.) and incubated at 30 °C for Lh. After protease K digestion, DNA 
was ethanol-precipitated. 200 ng of DNA was transformed into elec- 
trocompetent MC1061 bacterial cells that were spread onto plates 
containing kanamycin or kanamycin + tetracycline + streptomycin 
(KTS). Plasmids from 54 colonies from KTS plates were sequenced 
to determine the integration location on the plasmid and target site 
duplication (TSD) sequence. Sequence logo representing nucleotide 
frequencies of HzTransib TSD were generated and visualized with 
kpLogo web server”. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 


Data availability 

Atomic coordinates of six HzTransib or HzTransib-TIR DNA complex 
structures have been deposited in PDB under accession number 6PQN 
(HzTransib apo), 6PQR (HzTransib intact TIR PRC), 6PQU (HzTransib 
nicked TIR PRC), 6PQX (HzTransib TIR HFC), 6PQY (HzTransib TIR TEC) 
and 6PRS (HzTransib TIR STC). Five cryo-EM density maps of HzTransib 


complexed with different TIR DNA have been deposited in the Electron 
Microscopy Data Bank under accession numbers EMD-20452, EMD- 
20453, EMD-20455, EMD-20456 and EMD-20457, respectively. 


37. Kabsch, W. Xds. Acta Crystallogr. D 66, 125-132 (2010). 

38. Sheldrick, G. M. A short history of SHELX. Acta Crystallogr. A 64, 112-122 (2008). 

39. Terwilliger, T. C. et al. Decision-making in structure solution using Bayesian estimates of 
map quality: the PHENIX AutoSol wizard. Acta Crystallogr. D 65, 582-601 (2009). 

40. Terwilliger, T. C. et al. Iterative model building, structure refinement and density 
modification with the PHENIX AutoBuild wizard. Acta Crystallogr. D 64, 61-69 (2008). 

41. Emsley, P., Lohkamp, B., Scott, W. G. & Cowtan, K. Features and development of Coot. 
Acta Crystallogr. D 66, 486-501 (2010). 

42. Adams, P. D. et al. PHENIX: a comprehensive Python-based system for macromolecular 
structure solution. Acta Crystallogr. D 66, 213-221 (2010). 

43. Chen, V.B. et al. MolProbity: all-atom structure validation for macromolecular 
crystallography. Acta Crystallogr. D 66, 12-21 (2010). 

44. Zheng, S. Q. et al. MotionCor2: anisotropic correction of beam-induced motion for 
improved cryo-electron microscopy. Nat. Methods 14, 331-332 (2017). 

45. Rohou, A. & Grigorieff, N. CTFFIND4: fast and accurate defocus estimation from electron 
micrographs. J. Struct. Biol. 192, 216-221 (2015). 

46. Zivanov, J. et al. New tools for automated high-resolution cryo-EM structure 
determination in RELION-3. eLife 7, e42166 (2018). 

47. Bai, X. C., Rajendra, E., Yang, G., Shi, Y. & Scheres, S. H. Sampling the conformational 
space of the catalytic subunit of human y-secretase. eLife 4, e11182 (2015). 

48. Kucukelbir, A., Sigworth, F. J. & Tagare, H. D. Quantifying the local resolution of cryo-EM 
density maps. Nat. Methods 11, 63-65 (2014). 

49. Pettersen, E. F. et al. UCSF Chimera—a visualization system for exploratory research and 
analysis. J. Comput. Chem. 25, 1605-1612 (2004). 

50. Goddard, T. D. et al. UCSF ChimeraX: Meeting modern challenges in visualization and 
analysis. Protein Sci. 27, 14-25 (2018). 

51. Sievers, F. & Higgins, D. G. Clustal omega. Curr. Protoc. Bioinformatics 48, 3.13.1- 
3.13.16 (2014). 


52. Robert, X. & Gouet, P. Deciphering key features in protein structures with the new 
ENDscript server. Nucleic Acids Res. 42, W320-W324 (2014). 

53. Wu, X. & Bartel, D. P. kpLogo: positional k-mer analysis reveals hidden specificity in 
biological sequences. Nucleic Acids Res. 45, W534-W538 (2017). 


Acknowledgements We thank W. Eliason for assistance with size-exclusion chromatography- 
multiple angle light scattering; K. Zhou for assistance in freezing the cryo-EM grids of 
HzTransib-intact TIR complex; S. Wu for help with cryo-EM data collection at Yale West 
Campus; the staff of the Advanced Photon Source beamlines 24-ID-C and 24-ID-E for 
technical assistance with X-ray crystallography data collection; N. Craig for critical reading 
and many helpful comments on the manuscript. We are grateful for the advice, mentoring 
and support from T. Steitz during the early phases of this work. This work was supported by 
NIH grant RO1 Al137079 (D.G.S.), Yale University School of Medicine James Hudson Brown- 
Alexander Brown Coxe Postdoctoral Fellowship (C.L.) and NVIDIA GPU Grant Program (C.L. 
and YY.). 


Author contributions C.L. and D.G.S. conceived the project and designed the experiments. 
C.L. performed cloning, protein expression, purification, complex reconstitution, sample 
screening using negative-stain electron microscopy, cryo-EM grids preparation and functional 
assays. C.L. and YY. carried out protein crystallization, crystal structure determination, cryo-EM 
data collection and processing, atomic model building and refinement. C.L. and D.G.S. 
analysed data and wrote the manuscript with input from YY. 


Competing interests The authors declare no competing interests. 


Additional information 

Supplementary information is available for this paper at https://doi.org/10.1038/s41586-019- 
1753-7. 

Correspondence and requests for materials should be addressed to D.G.S. 

Peer review information Nature thanks Orsolya Barabas, Thomas Boehm, Ronald Chalmers 
and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. 
Reprints and permissions information is available at http://www.nature.com/reprints. 


Article 


a E é b 5 10 15 20 25 30 35 40 45 50 
s — UVA280 . . | . . ° ° ° . . 
g -—— Molecular Weight Lo: 4 5'TIR |CACGGT ATCGAAAAITCGGC ITAGAAGACATAGGATCTCGATGTC 
=) & 3'TIR |CACGGT ATCGAAAAIICGGC (TAGAAGACATAGGATCTCGATGTC 
= = Heptamer 1 
: o SSS | 
: P 
$ i) 
S c 
= 
126 bp > ATibp SS 276 bp 
10 2 20 STIR 3TIR 
Retention volume (ml) 
qd + Raph a ae + ~ A 
3 RS Na gy ee 3 RS > RS & we we cf oF ae KS we 
> 
> 
> 
> 
12 3 4 5 6 7 8 9 
S'TIR/3'TIR S'TIR 3'TIR 
f 
243,518 particles 
ry 3D classification, C2 symmetry 
» @ 
ant ) _—4 . 
#1 (5.8%) #2 (2.9%) #3 (4.3%) #4 (1.8%) #5 (5.9%) #6 (10.0%) 


y 


#7 (33.4%)  #8(10.6%) #9(4.6%) #10(8.1%)  #11(1.0%) #12 (11.6%) 


3D refinement, C2 symmetry 


Symmetry expansion 


165,380 particles 


two ZnB domains and flanking DNA 


focusing on one ZnB domain 
with residue signal subtraction 


with residue signal subtraction 


PY we VE VE QQ HHH 


Masked 3D classification focusing on | | Masked 3D classification 


#1 (4.4%) #2 (5.1%) #3 (7.2%) #4 (2.0%) #1 (3.8%) #2 (3.9%) #3(6.0%) #4(6.6%) #5 (8.1%) 
; = : 
YS OS OS CS Se eaeeca 
#5 (10.2%) #6 (39.3%) #7 (23.5%) #8 (8.4 %) 


#6 (14.5%) #7 (4.9%) #8 (34.0%) #9 (6.8%) #10 (11.3%) 
32,894 particles 58,111 particles 


Joo refinement, C2 symmetry fo refinement, C1 symmetry 


Final PRC (with intact TIRs) map PRC (intact TIRs) map with 
clear ZnB domain density 


Extended Data Fig. 1|See next page for caption. 


Extended Data Fig. 1| Biochemical characterization of H. zea Transib 
transposase and single-particle cryo-EM analysis of Transib in complex with 
intact TIR substrates. a, Size-exclusion chromatography-multiple angle light 
scattering analysis of purified Transib protein, indicating that it forms a dimer 
in solution. Size-exclusion chromatography was repeated three times and 
similar profiles were obtained. The multiple angle light scattering experiment 
was not repeated. b, Numbering and sequence of endogenous left end (5’TIR) 
and right end (3’TIR) of the Transib transposon with nucleotide differences in 
black boxes. The first 16 bp of the TIR sequence are the same as the 16-bp 
transposonend of the TIR substrates used in structure determination. 

c, Schematic of the TIR substrate DNA used in the in vitro DNA-cleavage assay. 
5’TIR and 3’TIR are shownas yellow and purple triangles, respectively. 

d, Cleavage of DNA substrates bearing one or two TIRs by MBP-tagged wild- 
type or mutant Transib transposases, each with the N-terminal 16 amino acids 
removed. The experiment was repeated three times and similar results were 
obtained. For gel source data, see Supplementary Fig. 1.e, Cleavage of DNA 


substrates bearing either full-length (lanes 1 and 2) or truncated (lanes 3-8) 
5/TIR or 3’TIR, with site of truncation indicated in the substrate name. The 
experiment was repeated three times and similar results were obtained. Open 
and closed arrowheads indicate single 5’TIR and single 3’TIR cleavage 
products, respectively. Red asterisk marks the double cleavage band. The DNA 
cleavage products were resolved in 5% TBE polyacrylamide gels and stained 
with SYBR Gold. f, Flow chart of cryo-EM structure determination of Transibin 
complex with intact TIR substrates. After the first round of 3D classification, 
3D auto-refinement using all of the particles in the best class generated a3.3A 
map. Further 3D classifications focusing on either two ZnB domains plus 
flanking DNA regions or on one ZnB domain with symmetry expansion were 
used to obtain the final 3.4 A map or a3.5A map with clear ZnB domain density. 
Allthree maps were used as cross-references for model building. The final map 
and accompanying local resolution illustrations are enclosed in the dashed 
black box. 
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Extended Data Fig. 4| Validation of cryo-EM structural models. a, Half-map 
FSC and model-map FSC curves of five cryo-EM maps from this study were 
generated from MolProbity. Gold-standard FSC curves between the two half 
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the atomic model and the final map with indicated resolution at FSC =0.5 arein 
orange. b, Cryo-EM densities superimposed on the atomic model for 
representative regions of Transib and TIR complexes. 
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Extended Data Fig. 5 | Structural comparison of Transib with RAG1. 


a, Superimposition of individual domains from Transib and RAGI structures. 


Because the ZnC, portion of the ZnB domain is missing from the Transib apo 
structure, the ZnB domain from Transib STC was used for structural 
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superimposition. Three structural motifs in RAG1 that are responsible for 
RAG2 interactions are highlighted in red boxes. b, The front and top views of 
Transib and RAGI dimer superimposed by their DDBD domains. c, Front and 
top view of the apo RAGI-RAG2 heterotetramer structure (PDB 4WWX)®. 
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Extended Data Fig. 6 | TIR recognition in Transib PRC, HFC and TEC. 

a, Superimposition of Transib dimer in PRC (dark colours) and apo (pale 
colours) structures by their DDBD illustrates the large conformational changes 
of ZnB domains (green in one subunit). b-e, TIR recognition in Transib PRC. 

b, Interactions between Transib CTT and the heptamer. Hydrogen bonds are 
shownas grey dotted lines. Labels for nucleotide residues are italic. 

c, Interactions between Transib and last three base pairs of heptamer. 

d, Interactions between Transib and transposon end DNA downstream of 
heptamer.e, Active site of Transib PRC structure. Distances between Mg” ion 
and scissile phosphate or E435 are indicated. f, The front and top views of two 
Transib PRC structures (incubated with either intact or nicked TIRs at 4 °C) 
superimposed by their DDBD domains. The Transib nicked PRC complex is 
referred to as a PRC because ofits strong structural resemblance to the intact 
DNA PRC. Depending on reaction conditions (temperature and divalent cation; 
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see Methods), the nicked TIR substrate can be incorporated into either a nicked 
PRC or the HFC. g, Superimposition of Transib dimer in HFC and PRC structures 
by their DDBD shows the inward movements of ZnB domains and dimer closure. 
h-k, TIR recognition in Transib HFC. h, Interactions between Transib and the 
first three base pairs of heptamer. i, The first nucleotide of the heptamer (C1) is 
flipped out and buried ina pocket.j, Interactions between Transib a9-a10 loop 
and TIRat heptamer-flanking DNA junction. k, Active site of Transib HFC 
structure. I, Interactions between Transib and TIR flanking DNA in PRC. 

m, Interactions between Transib ZnB domain and TIR flanking DNA in HFC. 

n, Superimposition of Transib dimer in TEC and HFC structures by their DDBD 
shows the outward movements of ZnB domains. 0, Comparison of transposon 
end DNA in TEC to that in HFC or in PRC. Mg” and Ca” ions are green and slate 
grey, respectively; other structure elements are coloured as in Fig. 2b. Scissile 
phosphate in each structure is highlighted in yellow. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Validation and analysis of Transib STC structure. 

a, Superimposition of 5-bp TSD region with the cryo-EM map contoured at 5.50. 
b, Front and top views of Transib STC structure superimposed onthe Transib 
HFC structure. c, Comparison of target DNA from H. zea Transib, retrovirus 
integrases, Mos1 transposase and Mu transposase STC structures. Target site 
DNAsare shownas green and red. The approximate degree of bending in each 
target DNA is indicated. H. zea Transib is the only DDE/D-family transposase— 
integrase for whicha STC structure has been reported that lacks a bend or base- 
unpairing at the centre of the target site DNA. Instead, Transib strongly bends 
target DNA near both edges of the target site DNA (between position -2 and -1 
and positions 1and 2), leading toa total 150° directional change of target DNA. 
Target DNAs in retroviral integrase STC structures exhibit relatively mild bends 
with one backbone kink at the centre of target site DNA, regardless of its length 
(ranging from 4 bp in PFV integrase to 6 bp in RSV integrase). The sharp 
bending (about 150°) at the centre of the Mos1 target DNA is achieved by 
flipping of the adenines in the TA target site. The target DNA in Mu STC exhibits 
amore continuous bending pattern through the 5-bp target site DNA, with one 
bend before the target site (between position —3 and —2), one at the centre and 
oneimmediately after the target site DNA (between position 2 and 3). The 
central bend is facilitated by the T-T mismatch in the target site. d, Transposon 


end-target DNA junction region of the Transib STC model superimposed on 
the cryo-EM map contoured at 5.50. Nucleotide residues in target DNA are 
labelled witha subscript T. e, Difference density between the Transib STC cryo- 
EM map and the model, showing the uncleaved target DNA phosphodiester 
bond ina portion of the particles used for cryo-EM map reconstruction. The 
difference map was contoured at 60. f, Superimposition of Transib TCC 
(protein in orange and metalions in green) active site with Transib HFC active 
site (protein in purple and metalions in grey). Distances are expressed inA. 
Attacking oxygen atoms in HFC and TCCare highlighted in black and red 
circles, respectively. In TCC, the phosphorus is 2.4 A from the attacking oxygen 
and the two metalions are 3.2 A apart. These distances are 3.6 Aand4.2Ain 
HFC. g, Sequence alignment of Transib transposases, vertebrate RAG1and 
deuterostome invertebrate RAGIL proteins, showing the regions 
corresponding to three RAG2-binding interfaces in RAG1. Residue numbers are 
for H. zea Transib. Hs, Homo sapiens (human); Mm, Mus musculus (mouse); Dr, 
Danio rerio (zebrafish); Gg, Gallus gallus (chicken); Bb, Branchiostoma belcheri 
(amphioxus); Sp, Strongylocentrotus purpuratus (purple sea urchin); Pf, 
Ptychodera flava (acorn worm); Pm, Petromyzon marinus (sealamprey) and Af, 
Asterias forbesi (sea star). 
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Extended Data Fig. 8 | Structural insights into the function and evolution of 
H. zea Transib CTT. a, Interactions between Transib CTT «20 and ZnB domain 


«12-013. Residues in CTT and ZnB are coloured red and green, respectively. 
Residues involved in hydrophobic interactions are shown in ball-and-stick 


representation. b, Superimposition of ZnB domain (pale colours) together with 


CTT «20 (dark colours) from the structures representing five stepsin 


transposition. c, Cleavage of DNA substrates bearing a 5’ TIR-3’TIR pair by MBP- 


tagged wild-type or CTT truncated mutant Transib transposases, each with 


N-terminal 16 amino acids removed. The DNA cleavage products were resolved 
ona6% TBE polyacrylamide gel and stained with SYBR Gold. Open and closed 


arrowheads indicate single 5’TIR and single 3’ TIR cleavage products, 
respectively. The red asterisk marks the double-cleavage band. The 


experiment was repeated at least three times independently and similar results 
were obtained. For gel source data, see Supplementary Fig. 1.d, 
Superimposition of Transib, RAG1and BbRAGLL structures by the first two 
helices of their CTDs. Transib and BbRAGIL CTT extend from the structurally 
conserved CTD and point in different directions. e, Sequence alignment of H. 
zea Transib CTT with vertebrate RAGI CTT and deuterostome invertebrate 
RAGIL CTT showing highly divergent sequences among the three groups. 
Residues mediating the hydrophobic interactions between ZnB a12-a13 and 
CTT a20are highlighted in green. Residue numbers and secondary structure 
elements at the top of the sequence alignment are for H. zea Transib. The 
residue number for the final amino acid in the sequence alignment is indicated 
for selected sequences. 
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Extended Data Table 1| Statistics of crystal data collection, phasing and refinement 


HzTransib apo HzTransibapo MHzTransibapo HzTransibapo HzTransibapo HzTransib apo 
native Br derivative I derivative Os derivative Pt derivative Hg derivative 


(PDB 6PQN) 


Data collection 


Space group P6122 P6122 P6122 P6122 P6122 P6122 
Cell dimensions 
a, b,c (A) 160.292, 159.315, 159.797, 160.238, 160.912, 160.353, 
160.292, 159.315, 159.797, 160.238, 160.912, 160.353, 
235.858 236.805 238.293 235.817 236.643 236.264 
a, By (°) 90, 90, 120 90, 90, 120 90, 90, 120 90, 90, 120 90, 90, 120 90, 90, 120 
Resolution (A) 200.0-3.01 200.0—3.84 200.0-3.81 200.0-3.18 200.0—3.96 200.0-3.14 
(3.19-3.01)* (4.30-3.84) (4.17-3.81) (3.35-3.18) (4.42-3.96) (3.31-3.14) 
Rsym or Rmerge 0.086 (1.829) 0.280 (3.748) —-0.196 (3.157) ~—- 0.135 (3.403) = 0.141 (2.482) ~— 0.134 (3.983) 
LF ol 14.47 (0.75) 21.46 (0.9) 19.87 (1.2) 21.18 (0.9) 22.45 (1.1) 21.4 (0.9) 
Completeness (%) 99.3 (99.3) 99.7 (99.1) 100.0 (99.8) 100.0 (100.0) 99.7 (99.1) 99.8 (99.0) 
Redundancy 7.7 (7.6) 12:7 (12.7) 19.2 (19.9) 19.3 (19.1) 1237 (1323) 19.5 (20.0) 
Refinement 
Resolution (A) 80.15-3.01 
(3.117-3.01) 
No. reflections 35894 (3455) 
Rwork / Rice 0.220/0.277 
No. atoms 
Protein 7288 
Ligand/ion 83 
Water 28 
B-factors 
Protein 147.85 
Ligand/ion 183.99 
Water 108.59 


R.m.s deviations 
Bond lengths (A) 0.004 


Bond angles ©) 0.77 


*One crystal was used for each dataset; values in parentheses are for the highest-resolution shell. 


Extended Data Table 2 | Statistics of cryo-EM data collection, refinement and validation 


PRC (intact TIR) PRC (nicked TIR) HFC TEC STC 
(EMD-20452) (EMD-20453) (EMD-20455) = (EMD-20456) (EMD-20457) 
PDB 6PQR PDB 6PQU PDB 6PQX PDB 6PQY PDB 6PR5 
Data collection and 
processing 
Magnification 130,000 130,000 130,000 130,000 130,000 
Voltage (kV) 300 300 300 300 300 
Electron exposure (e/A?) 50.8 52.2, 52.2. 54.4 54.4 
Defocus range (tum) -1.5=—-2.5 -1.5—-2.5 -1.5—-2.5 =1,5==2.5 -1.5—-2.5 
Pixel size (A) 1.05 1.05 1.05 1.05 1.05 
Symmetry imposed C2 C2 C2 C2 Cl 
Initial particle images (no.) 243,518 300,406 262,691 228,413 228,413 
Final particle images (no.) 32,984 59,333 3,997 26,397 43,661 
Map resolution (A) 3.4 3.3 4.6 4.2 3.3 
FSC threshold 0.143 0.143 0.143 0.143 0.143 
Map resolution range (A) 2.4-—5.2 2.4-5.6 4.0 —8.0 3.7-6.8 2.5-—5.6 
Refinement 
Initial model used (PDB 6PQN 6PQN 6PQN 6PQN 6PQN 
code) 
Model resolution (A) 3.7 3.6 4.8 4.7 3.5 
FSC threshold 0.5 0.5 0.5 0.5 0.5 
Model resolution range (A) 2.4-5.2 2.4-5.6 4.0-8.0 3.7-6.8 2.5-5.6 
Map sharpening B factor (A*) -90 -90 -126 -120 -90 
Model composition 
Non-hydrogen atoms 9266 10004 10116 8690 10194 
Protein residues 936 936 960 920 960 
Nucleotides 88 124 120 64 124 
Ligands 6 6 4 0 6 
B factors (A?) 
Protein 105.51 74.58 89.52 182.34 59.68 
Nucleic acid 104.92 131.57 160.90 158.19 91.35 
Ligand 109.58 93.13 81.45 - 69.54 
R.m.s. deviations 
Bond lengths (A) 0.009 0.008 0.006 0.007 0.008 
Bond angles (°) 0.830 0.890 0.984 0.980 0.768 
Validation 
MolProbity score 1.9] 2.04 2.39 235 1.64 
Clashscore 16.01 23,12 16.06 24.55 13.64 
Poor rotamers (%) 0.24 1:21 2:39 0.49 0.71 
Ramachandran plot 
Favored (%) 96.74 97.39 94.14 92.51 98.33 
Allowed (%) 3.26 2.39 5.23 6.83 1.36 


Disallowed (%) 0 0.22 0.63 0.66 0.31 
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- Accession codes, unique identifiers, or web links for publicly available datasets 
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- Adescription of any restrictions on data availability 


Atomic coordinates of six HzTransib or HzTransib-TIR DNA complex structures have been deposited in PDB under accession number 6PQN (HzTransib Apo), 6PQR 
(HzTransib-intact TIR PRC), 6PQU (HzTransib-nicked TIR PRC), 6PQX (HzTransib-TIR HFC), 6PQY (HzTransib-TIR TEC) and 6PR5 (HzTransib-TIR STC). Five cryo-EM 
density maps of HzTransib complexed with different TIR DNA have been deposited in the Electron Microscopy Data Bank under accession number EMD-20452, 
EMD-20453, EMD-20455, EMD-20456, EMD-20457, respectively. The atomic coordinates and cryo-EM density maps will be publicly available prior to publication. 
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Data exclusions No data were excluded from analyses. 


Replication All biochemical assays were repeated three times. Each replicate was an independent experiment and did not represent re-assay of the same 
material. All attempts at replication were successful. 


Randomization Randomization was not relevant to our study because our study did not involve the allocation of samples/organisms/participants into 
experimental groups. 


Blinding Investigators were not blinded to group allocation because group allocation was not involved in our study. Investigators were not blinded 


during data collection because the data being collected were quantitative in nature (gels or numbers of colonies on a plate) and were not 
prone to subjective interpretation. 
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RAF family kinases are RAS-activated switches that initiate signalling through the MAP 


kinase cascade to control cellular proliferation, differentiation and survival’ *. RAF 
activity is tightly regulated and inappropriate activation is a frequent cause of 
cancer* ®; however, the structural basis for RAF regulation is poorly understood at 
present. Here we use cryo-electron microscopy to determine autoinhibited and 
active-state structures of full-length BRAF in complexes with MEK1 and a 14-3-3 dimer. 
The reconstruction reveals an inactive BRAF-MEK1 complex restrained ina cradle 
formed by the 14-3-3 dimer, which binds the phosphorylated S365 and S729 sites that 
flank the BRAF kinase domain. The BRAF cysteine-rich domain occupies a central 
position that stabilizes this assembly, but the adjacent RAS-binding domain is poorly 
ordered and peripheral. The 14-3-3 cradle maintains autoinhibition by sequestering 
the membrane-binding cysteine-rich domain and blocking dimerization of the BRAF 
kinase domain. In the active state, these inhibitory interactions are released and a 
single 14-3-3 dimer rearranges to bridge the C-terminal pS729 binding sites of two 
BRAFs, which drives the formation of an active, back-to-back BRAF dimer. Our 
structural snapshots provide a foundation for understanding normal RAF regulation 
and its mutational disruption in cancer and developmental syndromes. 


RAF activity is restrained by an intricate interplay that involves phos- 
phorylation events, binding to 14-3-3 proteins, and intramolecular 
autoinhibitory interactions’*. The mammalian RAF kinases ARAF, BRAF 
and CRAF share three conserved regions (CR1, CR2 and CR3; Fig. 1a). The 
N-terminal CR1 region contains the RAS-binding domain (RBD) and the 
cysteine-rich domain (CRD), whereas the C-terminal CR3 region con- 
tains the serine/threonine kinase domain and a motif that, when phos- 
phorylated, serves as a binding site for 14-3-3 proteins. The intervening 
CR2 region consists of asecond 14-3-3 recognition site (Fig. 1a). 14-3-3s 
are dimeric proteins that bind specific serine- or threonine-phosphoryl- 
ated motifs in diverse signalling proteins’. Inthe absence of activating 
interactions with RAS, RAF proteins are thought to be maintained in 
an autoinhibited state that involves intramolecular interaction of the 
CRI region with the kinase domain, and the binding of 14-3-3 proteins 
to the phosphorylated CR2 (pS365 in BRAF) and C-terminal (pS729 in 
BRAF) 14-3-3 binding sites’*. RAF is recruited to the plasma membrane 
and activated ina process that involves the binding of GTP-bound RAS 
to its RBD domain. The adjacent CRD is also important for RAS-driven 
recruitment and activation®’. Structurally, the CRD is a C1 domain, a 
small modular domain found in many lipid- or membrane-activated 
signalling proteins’°. Normal RAF activation requires dimerization of 
the kinase domain”, and active RAFs form both homo- and heterodi- 
mers’, MEK1 and MEK2are the only known RAF substrates, and MEKs 
inturn selectively phosphorylate ERK1 and ERK2, the terminal kinases 


inthe RAS/MAP kinase cascade. Recent work has revealed that BRAF is 
pre-associated with MEK in the quiescent state’. Previous structural 
studies of BRAF and other family members have been restricted to iso- 
lated domains or fragments of these proteins **. To aid in developing 
an integrated structural understanding of the normal regulation of 
RAFs and their pathological activation in cancer, we sought to prepare 
and structurally characterize intact BRAF in autoinhibited and active 
states in complexes with MEK1 and 14-3-3 proteins. 


Overall structure of autoinhibited BRAF 


We co-expressed full-length wild-type BRAF with full-length MEK1 in 
insect (Sf9) cells. To eliminate potential heterogeneity due to phos- 
phorylation of the MEK1 activation loop, we used a variant of MEK1in 
which these phosphorylation sites were mutated to alanine (S218A/ 
$222A) in our structural studies. Affinity purification of the expressed 
proteins yielded well-defined complexes that also contained insect- 
cell-derived 14-3-3¢,¢ dimers (See Methods and Extended Data Fig. 1a). 
Co-expression of human 14-3-3 isoforms with BRAF and MEK1 did not 
fully displace the insect-cell14-3-3s and led to increased heterogeneity 
(Extended Data Fig. 1b). We therefore exploited the binding of the abun- 
dant and highly conserved endogenous 14-3-3 proteins. This approach 
enabled us to isolate ‘monomeric’ complexes that contained a single 
chain of each of BRAF, MEK1 and the two 14-3-3 subunits (Extended 
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Fig. 1| Structure of an autoinhibited BRAF-MEK1-14-3-3 complex. 

a, Schematic showing the domain organization of BRAF and MEK1. Key 
regulatory phosphorylation sites are indicated above the schematics and 
residue numbers for domain boundaries are shown below. BRS, BRAF-specific 
domain, whichis unique to BRAF; NR, MEK negative regulatory region. 

b, Single-particle reconstruction cryo-EM density map derived from imaging of 
the full-length BRAF-MEK1-14-3-3 complex, coloured according to local 
resolution. c, Ribbon diagram showing the overall structure of the complex. 
BRAF and MEK1 domains are coloured as ina, and the two subunits of the 14-3-3 
dimer are shown in orange and tan. Segments of BRAF containing the pS365 
and pS729 regulatory sites bind opposite sides of the 14-3-3 dimer, and are 
showninred. The CRD occupies acentral location in the complex, and has 
contacts to both 14-3-3 subunits, both the pS365 and pS729 regulatory 
segments, and the BRAF kinase domain. 


Data Fig. 1a). Consistent with an autoinhibited state, we found that both 
the S365 and S729 sites on BRAF were highly phosphorylated whereas 
those on the activation segment were not (Extended Data Fig. 1c). In 
complexes with wild-type MEK1, the MEK1 activation segment sites 
were also predominantly unphosphorylated (Extended Data Fig. 1c). 

We prepared a 192-kDa BRAF-MEK1-14-3-3¢,¢ complex in the pres- 
ence of the ATP analogue ATP-y-S and the MEK inhibitor GDC-0623. 
Cryo-electron microscopy (cryo-EM) imaging of this complex revealed 
well-dispersed particles, with two-dimensional (2D) class averages 
showing obvious secondary structure features (Extended Data Fig. 1d, 
e). Approximately 8,400 micrograph movies afforded single-particle 
reconstructions of this complex at a nominal resolution of 4.1A (Fig. 1b, 
Extended Data Fig. 1f), as detailed further in Methods and Extended 
Data Table 1. 

The cryo-EM map revealed a compact structure, in which inactive 
BRAF is secured in a 14-3-3 ‘cradle’ by extensive interactions with the 
14-3-3 dimer (Fig. Ic). The 14-3-3 engages both cognate sites in BRAF: 
the phosphorylated CR2 site (pS365) is bound inthe recognition groove 
onone side of the 14-3-3 dimer, whereas the C-terminal (pS729) motifis 
bound in the groove on the opposite side of the dimer. The BRAF CRD 
domainis particularly central to the overall architecture of the complex. 
It contacts both subunits of the 14-3-3 dimer, both the pS365 and the 
pS729 binding motifs, and the C-lobe of the BRAF kinase domain. The 
BRAF kinase domain is oriented such that its active site faces away from 
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the 14-3-3 domain, enabling it to coordinate MEK1 in a ‘face-to-face’ 
orientation. Both the MEK1 and the BRAF kinase domains exhibit ste- 
reotypical inactive conformations, in which their regulatory aC-helices 
are displaced from their active positions. The N-terminal BRAF-specific 
domainand the RBD of BRAF are not clearly defined in the cryo-EM map. 
Areconstruction that was filtered to aresolution of 5A and contoured 
at alower level revealed density adjacent to the CRD that corresponds 
to the RBD domain (Extended Data Fig. 1g), but it did not provide suf- 
ficient detail to enable the positioning of an RBD model. 


The 14-3-3 dimer organizes inactive BRAF 


Our BRAF complexes contain an approximately equimolar ratio of thee 
and Cisoforms (Extended Data Fig. 1a), and we expect that each side of 
the 14-3-3 dimer is a mixture of the two isoforms in our reconstructions. 
For simplicity and convenience, our model is constructed using the Spo- 
doptera frugiperda 14-3-3C sequence for both subunits, but with residue 
numbering corresponding to the human 14-3-3¢ isoform. The 14-3-3 
dimer interacts with every ordered domain of BRAF, and the interact- 
ing residues are highly conserved across all 14-3-3 isoforms (Extended 
Data Fig. 2a). The most N-terminal portion of BRAF that is well defined 
inthe cryo-EM maps is the CRD domain. The CRD domain fold, which 
is approximately 50 residues in length, contains a small B-sheet and is 
stabilized by two zinc coordination sites (Fig. 1c). The domain binds 
inthe centre of the 14-3-3 cradle, with contacts to both subunits of the 
dimer (Extended Data Fig. 2b). Notably, two loops of the CRD domain 
that are expected to mediate association of the domain with the mem- 
brane’”!*?° (residues 239-245 and 253-260) make extensive contact 
with the 14-3-3 domain in the autoinhibited complex (Extended Data 
Fig. 3a). Previous mutagenesis studies” of the CRAF CRD have identified 
two residues in this region that are important for binding to 14-3-3. The 
corresponding residues in BRAF (R239 and T241) are indeed found at 
the interface with the 14-3-3 domain (Extended Data Fig. 3a). 

The poorly conserved linker that connects the CRD domain and 
the CR2 region is not visible in our map, but the phosphorylated CR2 
segment is well defined in the phosphopeptide recognition groove 
on one side of the 14-3-3 dimer (Extended Data Fig. 3b). The ordered 
CR2 segment extends from Q359 to 1371, with pS365 roughly at its cen- 
tre. Beyond 1371, the linker that connects CR2 to the kinase domain is 
not visible. The BRAF kinase C-lobe contacts both 14-3-3 subunits but 
interacts most extensively with the pS365-binding subunit, packing 
against its x9 helix and «8-9 loop (Extended Data Fig. 3c). This por- 
tion of the 14-3-3 domain also contacts H510 and adjacent residues 
in the N-terminal lobe of the kinase domain. We observe continuous 
density connecting the C terminus of the BRAF kinase domain with the 
pS729 14-3-3 binding motif, which occupies the recognition groove on 
the opposite side of the 14-3-3 dimer (Extended Data Fig. 3d, e). BRAF 
residues S732—A736 thread between the CRD and the 14-3-3 domain 
as they exit the recognition groove, and weak density corresponding 
toa few additional residues indicates that the BRAF C terminus passes 
across a hydrophobic surface on the CRD domain before it becomes 
substantially disordered. The interactions of the 14-3-3 domain with the 
CR2 segment in the present structure are similar to those observed in 
acrystal structure of human 14-3-3Z in complex with a CRAF peptide” 
(Extended Data Fig. 3f). 


The autoinhibited BRAF-MEK kinase module 


The BRAF and MEK1 kinase domains bind with their active-site clefts 
juxtaposed, and both kinases exhibit inactive conformations (Fig. 2a). 
Density for ATP-y-S is visible in the BRAF active-site cleft (Extended Data 
Fig. 4a); in the MEK active site we also observe density corresponding 
toabound nucleotide, whichis seemingly ADP (Extended Data Fig. 4b). 
The inactive conformation of MEK1 and the face-to-face kinase orien- 
tation seen here is similar to that previously observed for the isolated 
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Fig. 2| Conformation of the autoinhibited BRAF kinase domain and location 
of oncogenic mutations. a, BRAF coordinates MEK1ina face-to-face 
orientation, with extensive contact between the kinase C-lobes. Both kinases 
adopt an inactive, aC-out conformation. b, Overall view of the autoinhibited 
BRAF kinase domain. The C-helix (magenta) is propped in an outward, inactive 
conformation by the inhibitory turnin the activation segment (orange). ATP- 
y-S is bound in the active-site cleft. c, Detailed view of the structure and 
interactions of the inhibitory turn. Residue V600, the most common site of 
oncogenic mutations, is part of acluster of hydrophobic residues (yellow) that 
stabilize this inactive conformation. K601also stabilizes this configuration; it 
forms a hydrogen bond with G596 in the DFG motif and packs with F468 in the 
glycine-rich loop. d, Oncogenic mutations (red) cluster in the inhibitory turn or 
in residues that coordinate ATP. aand bare drawn from the cryo-EM structure; 
candd fromthe crystal structure of the autoinhibited kinase domain complex. 


kinase domains of BRAF and MEK1™. However, the BRAF kinase domain 
inthe previous structure adopted an active but nucleotide-free confor- 
mation, as compared with the inactive, nucleotide-bound state in the 
present structure (Extended Data Fig. 4c, d). BRAF coordinates MEK1 
through an extensive interface that involves primarily the C-lobes of 
both kinases, including the kinase activation segments, which interact 
in an antiparallel manner (Extended Data Fig. 4e, f). 

In the BRAF kinase domain, the outward inactive position of the 
aC-helix is enforced by residues 598-602 in the activation segment, 
which form a helix-like turn that we refer to as the ‘inhibitory turn’ 
(Fig. 2b). The inhibitory turn packs together with hydrophobic residues 
in the glycine-rich loop, the C-helix and the B3 strand to stabilize the 
inactive state. This inhibitory arrangement resembles that observed 
inthe inactive states of other kinases including CDK2, Src and EGFR.A 
superficially similar configuration has been observed in crystal struc- 
tures of the BRAF kinase domain crystallized with sulfonamide-class 
inhibitors”’; however, direct comparison with the present structure 
reveals marked differences in the activation segment and in the rela- 
tive orientation of the N- and C-lobes of the kinase domain (Extended 
Data Fig. 4g, h). The inhibitor-bound conformation is approximately 
15° more open as compared with the nucleotide-bound inactive state 
observed inthe present structure. This difference prompted us to sys- 
tematically examine the relative N- and C-lobe orientationin more than 
50 BRAF kinase structures available in the Protein Data Bank. Notably, 
all previous BRAF kinase structures—none of which contain ATP or an 
ATP analogue—exhibit a markedly more open active-site cleft (owing 
to N-lobe rotations of 8-17°) as compared with the autoinhibited, 
nucleotide-bound structure described here (Extended Data Fig. 4i, j). 

To obtain a higher-resolution view of the autoinhibited BRAF kinase 
domain, we co-expressed it with full-length MEK1(S218A/S222A) in 
insect cells and crystallized the purified complex in the presence of 
GDC-0623 and the ATP analogue AMP-PNP (Extended Data Fig. 5a). 


The resulting crystal structure, determined at a resolution of 2.6 A, 
superimposes closely onto the corresponding portion of the auto- 
inhibited BRAF—-MEK1-14-3-3 cryo-EM structure (root mean square 
deviation, 0.56 A; Extended Data Fig. 5b), and the regions of interest 
discussed above are highly similar. The crystal structure reveals in 
detail interactions that stabilize the inhibitory turnin the BRAF kinase 
domain (Fig. 2c), and interactions with the bound nucleotide (Extended 
Data Fig. 5c). Notably, oncogenic mutations in the BRAF kinase domain 
cluster in a small region that contains both the inhibitory turn and 
nucleotide-binding residues (Fig. 2d). Considering the extent of the 
interactions with the nucleotide and the unique N-lobe orientation 
observed in both the cryo-EM structure and the crystal structure, we 
propose that ATP binding is an essential feature of the autoinhibited 
state. We also observe a hydrogen bond between MEKI1 residue E102 
and the ribose group of the AMP-PNP bound to the BRAF kinase domain 
(Extended Data Fig. 5d). 

The MEKI1 portion of the crystal structure includes an N-terminal 
helix that is the site of rare activating mutations, both in cancer and 
ina‘RASopathy’ knownas cardio-cutaneo-facial syndrome—a genetic 
developmental disorder that stems from aberrant signalling in the 
MAP kinase pathway”. This a-helix packs across the back of the N-lobe 
(Extended Data Fig. 5b, e), apparently contributing to the stability of 
the inactive aC-out conformation of MEK, as seen in previous work”. 
For reasons that are not yet clear, this helix is not resolved in the cryo- 
EM map of the autoinhibited complex. 


Inhibitory mechanisms of the 14-3-3 dimer 


The crystal structure described above shows that the MEK1 and BRAF 
kinase domains can adopt their mutually inhibited conformations in 
the absence of any interactions with the 14-3-3 protein. This raises the 
question of the role of the 14-3-3 dimer in BRAF inhibition. Our structure 
suggests that, rather than inducing an inactive conformation in the 
kinase domain, the 14-3-3 maintains the inhibited state by sterically 
blocking formation of the BRAF kinase domain dimer that is required 
for BRAF activation”. In the cryo-EM structure of the autoinhibited 
complex, the surface corresponding to the BRAF dimer interface is 
obstructed by the bound 14-3-3 dimer (Fig. 3a—c). In particular, dimer 
interface residues H510, D565 and Y566 are all in contact with the 14-3-3 
domain. Additionally, the 14-3-3 domain sequesters the CRD domain, 
whichis crucial for Ras-driven activation and membrane recruitment 
of BRAF. The surface corresponding to the membrane-binding loops 
of the CRD is largely occluded in the autoinhibited complex (Fig. 3d). 

The overall architecture of autoinhibited BRAF is probably shared 
with both ARAF and CRAF, as key interdomain contacts are highly con- 
served among these proteins (Extended Data Fig. 6). Consistent with 
our structural findings, early structure-function studies established a 
key role for the CRD in maintaining RAF in an autoinhibited state””°. An 
alanine scanning mutagenesis study of the CRAF CRD domain identified 
mutations in11surface-exposed residues that increased the RAS(G12V)- 
dependent activation of CRAF, including two that fully activated CRAF 
in the absence of mutant RAS”. All 11 of the corresponding residues 
in BRAF are located at interdomain contacts in the present structure 
(Extended Data Fig. 2b). Perhaps most compellingly, the BRAF CRD 
domain is a hot spot for germline mutations that cause Noonan syn- 
drome and related RASopathies”*. Altered residues map to sites of 
contact with 14-3-3 or the BRAF kinase domain in the present structure, 
providing a structural rationale for their activating effects (Extended 
Data Fig. 3g). 


Structures of active BRAF complexes 


The autoinhibited structure described above reveals a clear role for 
phosphorylation of both S365 and S729 in RAF autoinhibition. To 
further explore the role of these modifications in RAF regulation, we 
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Fig. 3 | The 14-3-3 domain blocks the BRAF dimer interface and occludes the 
membrane-binding region of the CRD domain. a, Surface representation of 
anactive BRAF kinase domain dimer, with interface residues shaded white for 
one subunit and grey for the opposite subunit (drawn from PDB entry 4MNE). 
b, Oblique view of the autoinhibited BRAF complex, with BRAF kinase domain 
residues that contact the a8-a9 loop of the 14-3-3 domain shaded yellow. 

c, Comparison of the 14-3-3 contact (left) and the BRAF dimer interface 
(centre). The respective surfaces are overlaid on the right, demonstrating that 
the interaction with 14-3-3 will sterically interfere with the dimerization of 
BRAF kinase. d, Surface view of the complex in the region of the CRD, with the 
putative membrane-binding loops of the CRD shaded green. 


prepared the following BRAF variants: BRAF(S36S5A), with a serine-to- 
alanine mutation at residue 365; BRAF(S729A); and the double mutant 
BRAF(S365A/S729A). We then expressed these variants with or without 
the co-expression of MEK 1in insect cells. Although we obtained soluble, 
stable BRAF—MEK1-14-3-3 complexes in experiments with BRAF(S365A) 
(Extended Data Fig. 7a), experiments with the S729A and S365A/S729A 
variants yielded little BRAF, and it was largely aggregated and did not 
co-purify with 14-3-3 proteins (data not shown). Size-exclusion chro- 
matography of the BRAF(S365A) sample revealed a broad peak con- 
taining BRAF(S365A), MEK1 and the 14-3-3 dimer. Examination of the 
phosphorylation state of the BRAF(S365A) in this peak revealed near 
stoichiometric phosphorylation of $729, but little phosphorylation 
of activation-segment sites T599 and S602 (Extended Data Fig. Ic). 
Nevertheless, the purified complex was highly active in MEK phospho- 
rylation assays (Fig. 4a, Extended Data Fig. 7a), and cryo-EM imaging 
of the complex revealed 2D class averages that were consistent with 
larger, dimeric complexes (Extended Data Fig. 8a). 
Athree-dimensional (3D) reconstruction, at approximately 5 A reso- 
lution, of the predominant species in this BRAF(S365A)-MEK1-14-3-3 
sample revealed an active, back-to-back BRAF kinase dimer, with MEK1 
bound to each BRAF kinase domain (Fig. 4b, Extended Data Fig. 8b). 
Asingle 14-3-3 dimer bridges the phosphorylated pS729 sites at the C 
termini of the two BRAF kinase domains. We do not observe interpret- 
able density for BRAF regions preceding the kinase domain, nor for 
the C terminus beyond S734. We built a model into this cryo-EM map 
by domain-wise rigid-body fitting of the previously reported active 
MEK-BRAF kinase domain complex (PDB ID: 4MNE) and the 14-3-3 
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Fig. 4| Structure and activity of active, dimeric BRAF-MEK1-14-3-3 
complexes. a, Kinase activity assays for autoinhibited and active BRAF 
complexes. Time course of phosphorylation of an exogenous, kinase-dead 
MEK\1 substrate is measured by western blotting for pS218/222 MEK for the 
autoinhibited wild-type monomer complex (left), the BRAF(S365A) complex 
(centre), and a wild-type BRAF-14-3-3 complex prepared without co-expression 
of MEK1ininsect cells (right). Blots for BRAF (anti-strepII) and MEK are 
provided as loading controls for enzyme and substrate, respectively. For gel 
source data, see Supplementary Fig. 1. b, Cryo-EM structure of the active, 
dimeric BRAF(S365A)-MEK1-14-3-3 complex. In this active configuration, the 
14-3-3 dimer bridges the S729-phosphorylated tails of the back-to-back BRAF 
dimer (right). Portions of BRAF that are N-terminal to the kinase domain are not 
visible inthe reconstruction. c, The same preparations also contained 
BRAF(S365A)-MEK1-14-3-3 complexes with only one MEK1 subunit. The overall 
organization of the remaining subunits is very similar, but the 14-3-3 dimer 
pivots to the side of the missing MEK. See also Extended Data Fig. 8. 


dimer from the autoinhibited complex described here. The BRAF- 
MEK kinase domain portion of the structure exhibits the same overall 
organization as the previous structure, and inspection of the cryo-EM 
map confirms that the BRAF C-helix is in its inward position, as expected 
for the active dimer. 

Three-dimensional classification of particles from the same set of 
images enabled the reconstruction of a second particle similar to the 
one described above, but with only a single MEK1 bound to the BRAF 
dimer (Fig. 4c). In this ‘MEK-lite’ complex, the 14-3-3 dimer cants tothe 
side of the missing MEK, assuming a more asymmetric position with 
respect to the back-to-back BRAF kinase domain dimer. 

We also expressed wild-type BRAF alone (without co-expression of 
MEK1) in both insect and mammalian (HEK293) cells, and obtained 
soluble BRAF in complex with endogenous 14-3-3 proteins using both 
expression systems (Extended Data Fig. 7b-f). Quantification of phos- 
phorylation in the elution fractions by mass spectrometry revealed 
near-stoichiometric phosphorylation of S729 and a high level of S365 
phosphorylation in peak fractions, but negligible phosphorylation of 
both T599 and S602 in BRAF produced by both mammalian and insect 
cells (Extended Data Fig. 8c-f). The BRAF-14-3-3 complex was highly 
active ina MEK phosphorylation assay (Fig. 4a, Extended Data Fig. 7b). 


Cryo-EM imaging of the mammalian-expressed complex revealed 
predominant 2D class averages that were consistent with a 14-3-3-bound 
BRAF dimer, as did imaging of the same sample supplemented with RAF 
inhibitor GDC-0879 (Extended Data Fig. 8g). We obtained a 3D recon- 
struction of the inhibitor-bound BRAF-14-3-3 complex at a nominal 
resolution of 7 A, which confirmed the dimeric state of the complex 
(Extended Data Fig. 8h, i). As with the MEK-bound dimer, the BRAF 
kinase domain forms the expected symmetrical, back-to-back dimer in 
this structure and we donot observe the N-terminal domains of BRAF. 
Despite the fact that the 14-3-3 dimer bridges the pS729 sites of the two 
kinase domains, it adopts a highly asymmetric position with respect 
to the kinase dimer (Extended Data Fig. 8i). In this skewed position, 
the 14-3-3 dimer approaches the active-site cleft and intrudes into the 
MEK-binding region of one BRAF kinase domain, but not the other. 


Discussion 


In the quiescent state, BRAF, MEK1, and a 14-3-3 dimer form a tightly 
integrated signalling device. In light of their extensive interactions, we 
propose that the RAF-MEK-14-3-3 complex—rather than RAF itself— 
serves as the RAS-activated switch that initiates signalling through the 
MAP kinase cascade. Phosphorylation of both of the 14-3-3-binding sites 
and engagement by a14-3-3 dimer is required for maturation of RAF into 
its regulated, inactive state. Our structural and biochemical findings 
suggest that MEK also contributes to the stability of the inactive state 
of BRAF, but we do not exclude the possibility that RAFs can assemble 
into an autoinhibited 14-3-3 complex without MEK. The essential role of 
pS729 in both the autoinhibited and active states of the kinase, and its 
stoichiometric phosphorylation in our purified complexes, leads us to 
suggest that this is a structural phosphorylation, rather than a regula- 
tory one. It is noteworthy that, although phosphorylation on T599 and 
S602 is widely thought to play a crucial role in BRAF activation”’, we 
find little to no phosphorylation on these sites in active BRAF-14-3-3 
dimers. The potential role of activation-loop phosphorylation in RAF 
regulation merits further study. 

The structures described here provide views of RAF in its quiescent 
and active states and, in light of previous functional dissection of RAF 
regulation, they outline a model for RAF activation (Extended Data 
Fig. 9). Inthe autoinhibited state the RBD is exposed, enabling recruit- 
ment of the quiescent complex to the membrane by activated RAS. By 
contrast, the CRD and its membrane-binding surface is largely buried by 
interactions with the 14-3-3 dimer and other segments of RAF, suggest- 
ing that its ‘extraction’ upon RAS binding and membrane localization 
is the key event that promotes the release of the inhibitory position 
of the 14-3-3 domain. When released from its inhibitory position, the 
14-3-3 dimer can rearrange to bridge the pS729 sites in the C-terminal 
tails of two BRAFs, driving formation of the active BRAF dimer. Once 
activated, BRAF can phosphorylate MEK, which promotes its release”. 
Steric effects of the 14-3-3 domain could also modulate the affinity 
for MEK, as evidenced by the asymmetric position assumed by 14-3-3 
upon MEK release. 

Theinactive-state structures described here reveal the bona fide inac- 
tive conformation of BRAF, and thereby provide a structural foundation 
for understanding its activation by mutations in cancer. Oncogenic 
mutations of V600 and K601 in the inhibitory turn are not compatible 
with the structural context of these residues, providing a rationale for 
their activating effect via destabilization of the inhibitory turn (Fig. 2c). 
Other less common oncogenic BRAF mutations occur in residues that 
participate directly inthe coordination of ATP and its associated diva- 
lent cation (Fig. 2d), and they may destabilize the autoinhibited state 
by weakening interactions with ATP and/or by disrupting interactions 
of the glycine-rich loop with the inhibitory turn in the activation seg- 
ment. Many of the same BRAF residues are also altered in RASopathies”® 
(Extended Data Fig. 5f). Outside of the kinase domain, CRAF and ARAF 
contain somatic point mutations in or near the CR2 phosphorylation 


site in diverse cancers, including lung adenocarcinoma*’. These muta- 
tions eradicate the CR214-3-3-binding site, promoting formation of the 
active RAF dimer—as we observe here with the BRAF(S365A) mutant. 
The KIAA1549:BRAF truncation/fusion oncoprotein that is found in 
paediatric low-grade gliomas lacks the entire CR1 and CR2 regions of 
BRAF, and is therefore constitutively active”. 

The integral nature of the RAF-MEK-14-3-3 switch has important 
pharmacologic implications. It is well established that certain MEK 
and RAF inhibitors can stabilize or destabilize their interaction’ **. 
However, the notion that the RAF-MEK-14-3-3 complex—which is dis- 
tinct from the isolated RAF and MEK kinases—may represent a relevant 
pharmacologic receptor for a broader range of inhibitors has not, to our 
knowledge, been systematically explored. Perhaps the most perplexing 
aspect of RAF-inhibitor pharmacology is the paradoxical activation of 
the MAP kinase pathway by certain RAF kinase inhibitors*®>”**. Diverse 
RAF inhibitors disrupt autoinhibitory interactions of the BRAF kinase 
with its N-terminal region”, and some promote dimerization of the iso- 
lated BRAF kinase domain”. Considering the extensive interactions of 
BRAF with ATP in the autoinhibited state, we speculate that RAF inhibi- 
tors may promote conformational activation by displacing ATP from 
quiescent RAF. Whether this leads to observed paradoxical pathway 
activation will in turn depend uponensuing cellular events—potentially 
including changes in RAF phosphorylation state, RAS-binding, mem- 
brane localization and 14-3-3 rearrangements—and on the potency of 
a particular agent as an inhibitor of activated RAF dimers. 

Many questions regarding RAF regulation remain. The structures 
described here and the ability to prepare full-length autoregulated and 
active BRAF will inform and enable detailed mechanistic studies of RAF 
activation and RAF-inhibitor pharmacology. Inthe long term, a deeper 
understanding of RAF regulation should aid in the development of more 
effective and better-tolerated therapeutics for RAF-driven cancers. 
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Methods 


Data reporting 

No statistical methods were used to predetermine sample size. The 
experiments were not randomized and the investigators were not 
blinded to allocation during experiments and outcome assessment. 


Celllines 

The insect (Sf9 and HiS) and mammalian (Expi293F) cell lines used for 
protein production were obtained from Thermo Fisher Scientific, and 
tested negative for mycoplasma contamination. 


Preparation of BRAF-14-3-3 complexes from Sf9 insect cells 
Recombinant baculovirus expressing full-length human BRAF withan 
N-terminal His,-tag and a C-terminal Strepll tag was prepared using 
baculoviral transfer vector pAc8. Recombinant baculovirus expressing 
the variant BRAF(S365A) was produced in the same manner. For pro- 
tein production using the baculovirus/insect cell expression system, 
litre-scale cultures of Sf9 cells (41 total for a typical preparation) were 
infected with high-titre viral stocks expressing wild-type or mutant 
BRAF (1% of final culture volume). Cells were collected 65-72 h post- 
infection, lysed in lysis buffer (SO mM Tris pH 7.4, 150 mM NaCl, 2 mM 
MgCl, 0.5 mM TCEP, 50 uM ATP-y-S and protease inhibitor cocktail 
(Thermo Fisher Scientific), and applied to Ni-NTA agarose beads (Qia- 
gen). After washing with Buffer A supplemented with 20 mM imidazole 
(Buffer A contains 50 mM Tris pH 7.4, 150 mM NaCl, 2 mM MgCl, 0.5 
mM TCEP, 10 uM ATP-y-S), bound proteins were eluted with Buffer A 
supplemented with 500 mM imidazole and adjusted to pH 8.0. Elu- 
ent was applied to a prepacked StrepTrap HP column (GE Healthcare 
Life Sciences) and washed with Buffer A adjusted to pH 8.0 (Buffer A’). 
Bound proteins were eluted with Buffer A’ supplemented with 10 mM 
desthiobiotin. The eluted complex was concentrated to approximately 
2mg mI“ using an Amicon Ultra concentrator (50 MWCO, Millipore) and 
further purified by size-exclusion chromatography (SEC) on a Super- 
dex 200 Increase 10/300 column or Superose 6 increase 10/300 (GE 
Healthcare Life Sciences) in Buffer A. Analysis of the purified samples 
by SDS-PAGE revealed the baculovirus-expressed BRAF co-purified 
at a high stoichiometry with insect-cell-derived 14-3-3e and 14-3-3¢. 


Preparation of BRAF-MEK1-14-3-3 complexes from Sf9 insect cells 
Recombinant baculovirus expressing full-length human MEK1 (either 
wild-type MEK1 or MEK1(S218A/S222A)) fused with an N-terminal 
His,-tag was prepared using baculoviral transfer vector pAc8. BRAF- 
MEK1-14-3-3 complexes (either wild-type or with the desired BRAF 
and/or MEK mutants) were prepared by co-expression in insect cells 
using separate baculoviruses for MEK1 and BRAF. Litre-scale cultures 
(41 total for a typical preparation) were co-infected with high-titre 
viral stocks expressing the desired BRAF and MEK] variants at a 1:1.5 
ratio (by volume, 1% culture volumes of BRAF virus, 1.5% of MEK1), and 
cells were collected by centrifugation 65-72 h post-infection. BRAF- 
MEK1-14-3-3 complexes were purified from cell pellets as described 
above for BRAF-14-3-3 complexes, but all buffers were supplemented 
with MEK inhibitor GDC-0623 toa final concentration of 2 uM. As with 
BRAF alone, co-expressed BRAF and MEK1 co-purified with insect-cell- 
derived 14-3-3¢ and 14-3-3Z. 


Preparation of the BRAF-14-3-3 complex from HEK293 
mammalian cells 

Full-length human BRAF bearing an N-terminal His,-tag and a C-terminal 
Strepll tag was cloned into pcDNA 5/FRT/TO vector. For protein pro- 
duction, litre-scale suspension cultures (21 total for atypical prepara- 
tion) of HEK293 cells (Expi293F) were transfected using the Expi293 
expression system according to the manufacturer’s protocol (Thermo 
Fisher Scientific). Cells were collected by centrifugation 48-60 h post- 
transfection. BRAF-14-3-3 complexes were purified from mammalian 


cell pellets as described above for isolation BRAF-14-3-3 from insect 
cells. Analysis of the purified sample by SDS-PAGE revealed that BRAF 
co-purified with mammalian-cell-derived 14-3-3 isoforms. 


Preparation of Spycatcher-MEK1 from Hi5 insect cells 

We prepared kinase-dead MEK1 (fused to Spycatcher to alter its electro- 
phoretic mobility) for use as a substrate in in vitro BRAF-activity assays. 
Recombinant baculovirus encoding full-length human MEK1(D190N) 
bearing an N-terminal His,-tag for purification and a C-terminal Spy- 
tag was prepared using baculoviral transfer vector pAc8. For protein 
production, litre-scale cultures (2 | total for a typical preparation) 
were infected with high-titre viral stocks expressing Spy-tagged 
MEK1(D190N) (1% of final culture volume). Cells were collected 55-65 
h post-infection, lysed in MEK lysis buffer (SO mM HEPES pH 7.4, 150 
mM NaCl, 2mM MgCl, 0.5 mM TCEP) and applied to Ni-NTA agarose 
beads (Qiagen). After washing with MEK lysis buffer supplemented 
with 20 mM imidazole, bound proteins were eluted with lysis buffer 
supplemented with 500 mM imidazole and adjusted to pH 7.4. To ensure 
that MEK1 was not phosphorylated, eluted protein was treated with 
lambda phosphatase overnight at 4 °C before further purification by 
SEC ona Superdex 75 Increase 10/300 column in SEC Buffer (SO mM Tris 
pH 8.0, 150 mM NaCl, 2 mM MgCl,, 1 mM TCEP). Pooled SEC fractions 
containing Spy-tagged MEK1(D190N) were incubated with Spycatcher 
protein for covalent linkage, as described previously*. Analysis of the 
purified Spycatcher-MEK1(D190N) protein by SDS-PAGE confirmed 
that it migrated as expected for a protein of approximately 55 kDa. 
Mass spectrometry and western blotting with pMEK1/2 antibody con- 
firmed little or no phosphorylation onthe MEK1 activation loop (S218 
and S222). 


Size-exclusion chromatography with multiangle light scattering 
The BRAF-MEK1-14-3-3 complex (prepared with MEK1(S218A/S222A)) 
was applied to a Superdex 200 10/300 GL column (GE Healthcare) in 
50 mM Tris-HCl pH 7.5, 150 mM NaCl, 2mM MgCl, 0.5 mM TCEP, 10 1M 
ATP-y-S, 2 1M GDC-0623. In-line multi-angle light scattering analysis 
was performed with an OptiLab rEX refractive index detector followed 
by aminiDAWN TREOS light scattering detector, and data were analysed 
with ASTRA (Wyatt Technology). 


Kinase-activity assay 

BRAF activity in SEC elution fractions was measured by diluting an 
aliquot of each fraction fivefold, and adding 1 ul of the diluted sam- 
ple to 14 pl of a reaction mixture containing assay buffer (SO mM Tris 
pH 7.5, 150 mM NaCl, 10 mM MgCl, 0.5 mM TCEP, and 1mM sodium 
vanadate) supplemented with 2.67 uM Spycatcher-MEK1(DI190N) as 
a substrate. Kinase reactions were started by addition of 5 pl of 4 mM 
ATP in assay buffer. After incubation for 20 min at 25 °C, reactions 
were stopped by addition of SDS-PAGE sample buffer and heating to 
95 °C. Reaction products were resolved on a Novex 12% Tris-Glycine 
Midi gel (Invitrogen), and subsequently western-blotted with anti- 
phosphoMEK1/2(S218/222) antibody (Cell Signaling Technology) to 
detect phosphorylation of the Spycatcher—MEK substrate (approxi- 
mately 55 kDa). 

Time-course kinase assays of BRAF—-MEK1-14-3-3, BRAF(S365A)- 
MEK1-14-3-3 and BRAF-14-3-3 samples were performed using the same 
reaction buffer and substrate concentrations described above, but 
each sample was diluted to 100 nM, and 20 pl was used in a final reac- 
tion volume of 200 ul (final enzyme concentration was 10 nM in the 
reaction mixture). After initiating the assay by addition of ATP, 20 ul 
aliquots were removed from the reaction at the designated time points 
and stopped by mixing with an equal volume of 5X SDS-PAGE sample 
buffer and heating to 95 °C. Reaction products were analysed by SDS- 
PAGE and western blotting for phosphoMEK1/2 as described above. 
The BRAF-MEK1-14-3-3 complexes were prepared with MEK1(S218A/ 
$222A). 
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Pull-down assay for 14-3-3 association 

Recombinant baculoviruses expressing full-length human 14-3-3B/a, 
14-3-3y, 14-3-36/C or 14-3-3e and bearing an N-terminal Flag-tag were 
prepared using baculoviral transfer vector pAc8. BRAF and MEK1 were 
co-expressed with each of the four different human 14-3-3 isoformsin 
insect cells by co-infection. Co-infected Sf9 cells (100 ml) were collected 
55-65 h post-infection, lysed in lysis buffer, and then parallel aliquots 
of clarified lysate were applied to Strep-TactinXT magnetic beads (IBA 
GmbH) or AnticDYKDDDDK magnetic Agarose (anti-Flag, Pierce). After 
washing beads with lysis buffer, bound proteins were eluted with SDS- 
PAGE sample buffer and resolved on 8% Bis-Tris PAGE gels. Parallel gels 
were western-blotted using anti-14-3-3 (pan) and anti-Flag antibodies 
(Cell Signaling Technology). 


Cryo-EM data acquisition and processing 

The MEK1(S218A/S222A) variant was used to prepare all BRAF-MEK1- 
14-3-3 complexes for structural analysis. BRAF-MEK1-14-3-3 complexin 
SEC buffer (SO mM Tris-HCl pH 7.5, 150 mM NaCl, 2mM MgCl, 0.5 mM 
TCEP, 10 uM ATP-y-S, 2 1M GDC-0623) was applied to glow-discharged 
holey carbon grids (Quantifoil R1.2/1.3, 400 mesh) and vitrified using 
a FEI Vitrobot Mark IV. Frozen hydrated samples were imaged on an 
FEI Titan Krios at 300 kV with a Gatan Quantum Image Filter with K2 
Summit direct detection camera in super-resolution mode witha total 
exposure dose of around 50 electrons. Thirty-five frames per movie 
were collected at a magnification of 130,000x, corresponding to 0.53 A 
per pixel. In total, 8,440 micrographs were collected at defocus values 
ranging from -1.8 to -3.3 um from two data collections of 4,097 and 
4,343 images (of which 1,628 images in the later collection were tilted 
by 25° in an attempt to increase observed orientations). The movie 
frames were motion-corrected and dose-weighted by MotionCor2”, 
downsampled to 1.06 A per pixel and contrast transfer function (CTF) 
parameters were estimated by CTFFIND4*°. Particle picking was car- 
ried out using crYOLO* and template-based particle picking within 
Relion” giving 3,531,955 initial particles. Following successive rounds 
of 2D and 3D classification 427,592 particles were selected. An addi- 
tional two rounds of 3D classification led to the final reconstruction 
of 4.1A from 165,298 particles. Per particle motion correction ‘particle 
polishing’ alongside per particle CTF refinement was trialled with no 
improvement in resolution or map quality. Maps used for figures were 
filtered according to local resolution with b-factor sharpening within 
Relion*”. Models were fit into the map using Coot and further refined 
with PHENIX“ and REFMACS™. Statistics for the final refinement are 
presented in Extended Data Table 1. 

The BRAF(S365A)-MEK1-14-3-3 complex in SEC buffer (SO mM Tris- 
HCI pH 7.5, 150 mM NaCl, 2mM MgCl, 0.5 mM TCEP, 10 1M ATP-y-S, 
2 uM GDC-0623) was applied to glow-discharged holey carbon grids 
(Quantifoil R1.2/1.3, 400 mesh) and vitrified using a Leica EM GP. Fro- 
zen hydrated samples were imaged on an FEI Talos Arctica at 200 kV 
with K3 Summit direct detection camera in counting mode with a total 
exposure dose of around 50 electrons. Forty-two frames per movie were 
collected at a magnification of 36,000, corresponding to 1.11 A per 
pixel. Micrographs (3,146) were collected at defocus values ranging 
from-2.0 to-3.0 pm. Initial particle picking was carried with crYOLO” 
and ab-initio models were generated in cryoSPARC“ showing two dis- 
tinct classes: one showing back-to-back BRAF kinase dimer, with MEK1 
bound to each BRAF kinase domain; the other having only a single MEK1 
bound (MEK-lite). All following steps were carried out within Relion™. 
Reference-based picking resulted in 2,008,323 particles. Following 2D 
classification 1,441,851 particles were subjected to a guided 3D clas- 
sification using single copies of the dimeric and MEK-lite as reference 
models. Dimer particles (705,222) were then subjected to a further 
round of standard 3D classification leaving 425,135 particles. After 
Bayesian polishing and 3D refinement this resulted in a reconstruc- 
tion of 4.9 A. In addition, 736,629 particles were identified as MEK-lite 
and, after further 3D classification, 595,672 particles were subjected to 


Bayesian polishing and 3D refinement resulting in a 5.7 A reconstruc- 
tion. Models were built for both reconstructions by rigid-body fitting 
the BRAF and MEK1 kinase domains from PDB entry 4MNE using Coot*. 
The 14-3-3 domain was modelled by rigid-body fitting of the insect-cell 
14-3-3 C-domain from the autoinhibited BRAF-MEK1-14-3-3 structure 
described here; each subunit of the 14-3-3 dimer was fit independently. 
C-terminal pS729 tails were manually built for each structure, with 
reliance on the autoinhibited structure for placement of pS729. For 
both reconstructions, cryo-EM maps were deposited in the Electron 
Microscopy Data Bank and polyalanine models were deposited in the 
Protein Data Bank. Data collection and image processing statistics for 
both structures are presented in Extended Data Table 1. 

The mammalian-cell-produced BRAF-14-3-3 complex in SEC buffer 
(50 mM Tris-HCl pH 7.5, 150 mM NaCl, 2 mM MgCl,, 0.5 mM TCEP, 10 
UM ATP-y-S, 2 1M GDC-0623) with or without 1 1M GDC-0879 was 
applied to glow-discharged holey carbon grids (Quantifoil R1.2/1.3, 
400 mesh) and vitrified using a Leica EM GP. Frozen hydrated samples 
were imaged on an FEI Titan Krios at 300 kV with a Gatan Quantum 
Image Filter with K3 Summit direct detection camerain counting mode 
with a total exposure dose of around 70 electrons. Fifty frames per 
movie were collected at a magnification of 105,000x, corresponding 
to 0.85 A per pixel. Micrographs (4,002 and 4,418 per sample, with and 
without 1 uM GDC-0879) were collected at defocus values ranging from 
-1.7 to -2.7 pm. The movies were downsampled to 1.7 A and particle 
picking was carried out on the GDC-0879 sample with crYOLO” giv- 
ing 365,083 particles. After two rounds of 2D classification, 234,539 
particles remained. Two further rounds of 3D classification within 
Relion”, using an initial model derived from cryoSPARC™, resulted in 
66,215 particles leading to a 6.8 A reconstruction after 3D refinement. 
The BRAF-14-3-3 model was constructed by rigid-body fitting the 
BRAF and MEK1 kinase domains from PDB entry 4MNE using Coot*. 
The 14-3-3 domain was modelled by rigid-body fitting of the human 
14-3-3 C-domain (PDB ID: 3NKX). Each subunit of the 14-3-3 dimer was 
fit independently, and the C-terminal pS729 tails were manually built. 
Thecryo-EM map has been deposited in the Electron Microscopy Data 
Bank and the polyalanine model has been deposited in the Protein 
Data Bank. Data collection and image processing statistics for this 
structure are presented in Extended Data Table 1. 


Expression and purification of the BRAF-MEK kinase domain 
complex 

For insect-cell expression of the BRAF kinase domain in complex with 
MEK1, two recombinant baculovirus species were used. The first was 
prepared using baculoviral transfer vector pFastBac Dual and encoded 
the BRAF kinase domain (BRAF residues 445-723, fused to an N- 
terminal His,-tag and a C-terminal chitin-binding domain) and human 
chaperone CDC37. The second baculovirus encoded full-length human 
MEK1(S218A/S222A), as described above. For protein production, 
litre-scale suspension cultures of Sf9 cells were co-infected with both 
viruses. Cells were collected 60-66 h post-infection and resuspended 
in lysis buffer (SO mM Tris pH 8.0, 250 mM NaCl, 5% glycerol and 
20 mM imidazole, 1mM TCEP) with protease inhibitor cocktail (Roche). 
Resuspended cells were disrupted by sonication on wet ice, and the 
lysate was clarified by ultracentrifugation at 40,000 r.p.m. for two 
hours. Clarified lysate was batch-bound to Ni-NTA beads and washed 
extensively with binding buffer before elution with elution buffer 
(SO mM Tris pH 8.0, 250 mM NaCl, 250 mM imidazole, 1 mM TCEP). 
The elution fractions were pooled and treated with 1:1,000 molar 
ratio of TEV protease and 100 mM B-mercaptoethanesulfonic acid 
(MESNA) overnight to cleave N-terminal and C-terminal tags, and 
further purified by SEC (Superdex 200 10/300, GE Healthcare) in 
storage buffer (SO mM HEPES pH 7.5, 150 mM NaCl, 1 mM TCEP). The 
fractions were analysed by SDS-PAGE, and fractions corresponding to 
the BRAF-MEK1 kinase domain complex were pooled, concentrated to 
8 mg mI, and flash-frozen. 


BRAF-MEK kinase domain crystallization and structure 
determination 

For crystallization, an aliquot of the BRAF-MEK1(S218A/S222A) kinase 
complex was incubated with 5 mM MgCl, 2 mM adenosine 5’-(B,y- 
imido)triphosphate (AMP-PNP), and 0.2 mM GDC-0623 in storage 
buffer at 4 °C overnight. Rod-shaped crystals suitable for structure 
determination were obtained by vapour diffusion in hanging drops 
using areservoir solution consisting of 100 mM Bis-Tris pH 6.5, 200 mM 
ammonium sulfate, and 22% PEG 3350 at room temperature. Crystals 
were collected and flash-frozen in liquid nitrogen using additional 20% 
glycerol as a cryoprotectant. X-ray diffraction data were collected at 
100 K using NE-CAT beamline ID-24-C at the Advance Photon Source, 
Argonne National Laboratory, at a wavelength of 0.979 A. Data were 
integrated and merged using XDS” and scaled using Aimless in the 
CCP4 suite*’. The structure was phased by molecular replacement in 
PHASER” using the relevant domains of the autoinhibited cryo-EM 
structure and PDB entry 4MNE as initial search models. GDC-0623 
was placed into positive density in an initial F, — F, map and included 
in subsequent rounds of refinement using PHENIX.REFINE©. Succes- 
sive manual refinement was performed using Coot. The structure 
was refined to Ryor/Riree Values of 0.22/0.25 at a resolution of 2.58 A. 
Data collection and refinement statistics are presented in Extended 
Data Fig. 5a. 


Mass spectrometry analysis 

BRAF complexes were digested separately with trypsin and Lys-C, 
desalted by C18, dried by vacuum centrifugation, and analysed in 
triplicate by capillary electrophoresis coupled to mass spectrometry 
(CE-MS) using a ZipChip autosampler and CE instrument (908 Devices) 
interfaced to a QExactive HF mass spectrometer (Thermo Fisher Scien- 
tific). Peptides were loaded for 20s onan HR chip and electrophoresis 
was performed at 700 Vcm ‘for 10 min, with pressure assist activated 
at 1 min. Toidentify BRAF phosphopeptides, digests were analysed by 
data-dependent tandem mass spectrometry (MS/MS). The five most 
abundantions in each MS scan (60K resolution) were subjected to MS/ 
MS (15K resolution, 30% collision energy). Dynamic exclusion was acti- 
vated with a repeat count of 1 and an exclusion duration of 6s. MS/MS 
spectra were converted to .mgf format using multiplierz software”, 
and searched against a database of BRAF using Mascot 2.6.1. Search 
parameters specified trypsin or Lys-C specificity with up to two missed 
cleavages; variable oxidation of methionine; variable phosphoryla- 
tion of serine, threonine or tyrosine; fixed carbamidomethylation of 
cysteine; and precursor and product ion tolerances of 10 ppm and 
25 mmu, respectively. Identified sites of phosphorylation were con- 
firmed using mzStudio software”. In experiments to determine the 
stoichiometry of phosphorylation, digests were analysed by CE-MS 
(MS1 scans with 15K resolution), with precursor peak areas used for 
quantification according to the following equations: 


Acorr.P-pep 


% Phosphorylation = 
Atot 


Atot =Acorr.p-pep + Anonp-pep 


Acorr.P-pep =Ap_pep x Corr. factor 


where A corr. p-pep is the area of the phosphopeptide corrected for dif- 
ferences in ionization efficiency due to phosphorylation, A,,, is the total 
peptide peak area, Ayonp-pep iS the peak area of the unphosphorylated 
peptide, Ap.,., is the uncorrected peak area of the phosphopeptide, 
and Corr. factor is the correction factor for ionization efficiency. Cor- 
rection factors for ionization efficiency were determined in separate 
experiments using either BRAF protein digests or synthetic BRAF 
peptide standards. Peptides, with or without treatment with alka- 
line phosphatase (pptase), were analysed in triplicate by CE-MS as 


described above. After normalizing for loading amounts (using non- 
phosphorylatable BRAF peptides VFLPNK and LIDIAR for BRAF digests 
or aspiked standard peptide for BRAF synthetics), correction factors 
were calculated according to: 


ANonP-pep(+pptase) a ANonP-pep(-pptase) 


Corr. factor = 
Ap-pep(-pptase) 


where Anonp-pep(+pptase) is the area of the non-phosphorylated peptide after 
phosphatase treatment, Anonp-pep(-pptase) iS the area of the non-phospho- 
rylated peptide without phosphatase treatment, and Ap. pep¢pptase) iS the 
area of the phosphorylated peptide without phosphatase treatment. 

Because discovery experiments did not detect phosphorylation 
of MEK1 activation loop sites 218/222 (peptide ,..lCDFGVSGQLIDS- 
MANSFVGTR,,,) we estimated an upper bound of phosphorylation of 
these residues by analysing digests with or without pptase treatment 
as above. In these experiments, we used targeted selected ion moni- 
toring scans of ,),.LCDFGVSGQLIDSMANSFVGTR,,, and normalization 
peptides, and then calculated MEK1 activation loop phosphorylation as: 


ANonP-MEK(-pptase) 


%Phosphorylation = 1- 
NonP-MEK(+pptase) 


where Anonp-mek(-pptasey Corresponds to the peak area of the unphos- 
phorylated MEK peptide without phosphatase treatment and 
Anon-MEK(+pptasey Corresponds to the peak area of the unphosphorylated 
MEK peptide after phosphatase treatment. Data analysis and peak 
integration were performed using mzStudio software™.To identify 
14-3-3 proteins, MS/MS spectra from data-dependent CE-MS analyses 
of trypsin and Lys-C digested BRAF complexes were converted to .mgf 
format using multiplierz software”, and searched against a forward- 
reverse human protein database (uniprot) with Mascot 2.6.1 (using the 
same search parameters as described above). Data were filtered toa 
false discovery rate of around 1%, and peptide sequences mapped to 
genes using the multiplierz pep2gene tool™. 


Sequence alignments and software 

For Extended Data Figs. 2a, 6, sequences were aligned using ClustalW 
and figures were prepared with ESPript 3.0%. Structural biology applica- 
tions used in this project were compiled and configured by SBGrid*. 


Reporting summary 
Further information on research design is available in the Nature 
Research Reporting Summary linked to this paper. 
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Three-dimensional cryo-EM density maps have been deposited in the 
Electron Microscopy Data Bank (EMDB) with accession codes EMD- 
0541, EMD-20550, EMD-20552 and EMD-20551. Atomic coordinates 
corresponding to these cryo-EM reconstructions have been deposited 
inthe Protein Data Bank (PDB) with accession codes 6NYB, 6QOJ, 6QOT 
and 6QOK. Structure factors and atomic coordinates for the BRAF 
kinase domain-MEK\I crystal structure have been deposited inthe PDB 
with accession code 6PP9. 
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Extended Data Fig. 1| See next page for caption. 
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Extended Data Fig. 1| Biochemical characterization of purified BRAF 
complexes and cryo-EM analysis of the autoinhibited BRAF-MEK1-14-3-3 
complex.a, The full-length, autoinhibited BRAF-MEK1-14-3-3¢,¢ complex 
used for cryo-EM structure determination. Left, the elution profile from SEC on 
a Superdex 200 column; centre, Coomassie-stained SDS-PAGE analysis of 
elution fractions; right, analysis by size-exclusion chromatography with multi- 
angle light scattering. A molar mass of 196 kDa was indicated; the calculated 
molecular weight of the complex is 192 kDa. b, Analysis of co-expression of 
human 14-3-3 isoforms with BRAF and MEK Lin insect cells. Left, Strep-tagged 
BRAF, MEK1(S218A/S222A) and the indicated Flag-tagged human 14-3-3 
isoforms were co-expressed in Sf9 cells and BRAF-MEK1-14-3-3 complexes 
were affinity-isolated from clarified lysates with either Strep-TactinXT (left 
four lanes) or anti-Flag (right four lanes) magnetic beads. Right, parallel gels 
were blotted with an anti-14-3-3 antibody that recognizes all 14-3-3 isoforms 
(top blot) or with an anti-Flag antibody (bottom blot). Note that even inthe 
presence of robust overexpression of these human isoforms, BRAF 
preferentially associated with the endogenous insect-cell 14-3-3 proteins (as 
seen inthe Strep-TactinXT-precipitated lanes of the Coomassie-stained gel). 

c, Mass-spectrometry-based quantification of selected phosphorylation sites 


in complexes with wild-type BRAF and with BRAF(S365A) purified for 
structural analysis. Note that the BRAF(S36SA) complex was prepared with 
MEK1(S218A/S222A), whereas the wild-type BRAF complex used in this analysis 
contained wild-type MEK1.d, Portion of arepresentative micrograph used for 
reconstruction of the autoinhibited BRAF-MEK1-14-3-3 complex. 

e, Representative 2D class averages for reconstruction of the autoinhibited 
BRAF-MEK1-14-3-3 complex. Scale bar, 10 nm. f, Fourier shell correlation (FSC) 
curves for the reconstruction. The horizontal line indicates a correlation of 
0.143; the FSC curve for two half-maps (blue) crosses this threshold ata 
resolution of 4.1A. A correlation curve for the map versus the atomic model is 
plotted inred.g, The cryo-EM map of the autoinhibited BRAF-MEK1-14-3-3 
complex filtered to 5 A resolution and contoured at a lower level to reveal 
weaker density corresponding to the RBD domain. The map surface is coloured 
by domainas in Fig. 1. Unassigned densities (grey) can be ascribed to the RBD 
domain and other poorly structured elements as indicated. For gel source data, 
see Supplementary Fig. 1. Experiments ina and b were repeated at least twice 
with similar results. Imaging experiments in d and e were repeated four times 
with similar results. 
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Extended Data Fig. 2|14-3-3 domain sequence alignment and interactions of 
the CRD domain in the autoinhibited state. a, Sequence alignment of insect- 
cell (S. frugiperda) and human 14-3-3 isoforms. Secondary structure is indicated 
above the alignment. Identically conserved residues are shaded red. Symbols 
above the alignment indicate contacts with the BRAF CRD domain (purple 
squares), kinase domain (blue triangles), and pS365 or pS729 segments (black 
circles). b, Interactions of the CRD domain. Domains that contact the CRD are 
shown withatransparent surface and the CRD domainis shownasa purple 
ribbon with grey spheres representing bound zinc atoms. Sidechains are 


shown for CRD residues that correspond to 7 (of 11 total) residues identified in 
an alanine scanning mutagenesis study of the CRAF CRD domain”. Alanine 
mutations inthe corresponding residues increased RAS(G12V)-dependent 
activation of CRAF. Two mutations in this study fully activated CRAF inthe 
absence of RAS(G12V); the corresponding BRAF residues are F247 and D249. 
F247 makes hydrophobic contacts with both the kinase C-lobe and the 14-3-3 
domain, whereas D249 is positioned to forma salt bridge with R691 inthe 
kinase C-lobe. The remaining four residues are also at sites of interdomain 
contacts but are not illustrated (1241, K253, Q262 and K267). 


Extended Data Fig. 3| See next page for caption. 
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Extended Data Fig. 3 | Interactions of the 14-3-3 dimer with BRAF inthe 
autoinhibited state. a—d, Cryo-EM density is shown at key sites of interaction 
that stabilize the autoinhibited complex, and domains are coloured as in Fig. 1. 
a, Aportion of the interface between the CRD and 14-3-3 domain. b, The pS365 
segment (CR2) bound inthe recognition groove of the 14-3-3 domain.c, 
Contact between the x8-a9 loop of the 14-3-3 domain and the BRAF kinase 
domain. d, The C-terminal pS729 segment coordinated in the opposite 
recognition groove of the 14-3-3 dimer. The map is contoured at the same level 
ina-d.e, Front and back views of the reconstruction. We observe continuous 
density connecting the C terminus of the BRAF kinase and the pS729 14-3-3 
binding site (inset). f, Comparison of the binding mode of the pS365 segment in 
the present structure with that ina previously determined crystal structure of 


an isolated CRAF peptide bound to 14-3-37¢ (PDB ID: 3NKX). The corresponding 
region of the present structure (the pS365 segment is shown with orange 
carbon atoms and the 14-3-3 domain is shown in tan) is superimposed onthe 
3NKX crystal structure (shown in blue and cyan), revealing aclose 
correspondence in conformations of the bound peptides. g, The BRAF CRDisa 
hot spot for RASopathy mutations, which map to sites of contact between the 
CRD (purple), kinase (blue) and 14-3-3 domains (tan), and are expected to 
destabilize the autoinhibited assembly. Sites of RASopathy mutations are 
shown instick form and are labelled. RASopathy mutations in the BRAF kinase 
domain (Q709) and CR2 region (red, S365) are also expected to destabilize 
these inhibitory intramolecular contacts. 
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Extended Data Fig. 4 | Additional views and analysis of the BRAF and MEK 
kinase domains in the autoinhibited BRAF-MEK1-14-3-3 complex. a, Cryo- 
EM density map in the region of the BRAF active site showing bound ATP-y-S. 
b, Cryo-EM density map in the region of the MEK active site indicating bound 
ADP, whichis probably hydrolysed from ATP-y-S. Maps inaand bare contoured 
at the same level. c, Superposition of the BRAF-MEK1 component of the 
present autoinhibited cryo-EM structure (green and dark blue) with the 
previously reported crystal structure of aBRAF and MEK1 kinase domain 
complex (yellow and light blue; PDB ID: 4MNE). The superposition is based on 
the MEK component of the structures, and it reveals a relative rotation of BRAF 
of approximately 5° about the C-lobe contact. d, Superposition of the BRAF 
kinase domain from the present structure with that of previously isolated 
BRAF-MEK kinase complex (PDB ID: 4MNE). Note that the present structure 
(dark blue, with C-helix coloured purple and the activation segment orange) 
exhibits key features of an autoinhibited state (C-helix out, with an inhibitory 
turnin the activation segment), whereas the previous structure (light blue) 
adopts an overall active conformation. e, Detailed view of a portion of the 
C-lobe contact between BRAF (blue) and MEK1 (green). f, Portions of the BRAF 
(blue) and MEK1 (green) activation segments interact in an anti-parallel 
orientation. Activating phosphorylation sites in the MEK1 activation loop are 
substituted with alanine in this structure (S218A/S222A), but neither residue is 
positioned appropriately for phosphorylation by BRAF. Note that our 
discussion of these interactions relies in part on the crystal structures 
referenced to build the atomic model, as the cryo-EM map in this region does 


not unambiguously define all sidechain conformations. g-j, Comparison of 
BRAF kinase domain conformations and relative N- and C-lobe orientations. 

g, Sulfonamide-containing BRAF inhibitors perturb the inactive conformation 
of BRAF. The BRAF kinase domain in the present structure (blue ribbon, with 
C-helix coloured red and the activation segment orange) is superimposed on 
the structure of the BRAF kinase domain crystallized as amonomer with 
PLX4720 (grey, PDBID: 4WOS). The superposition is based on the C-lobes of 
both kinases, revealing an altered orientation of the N-lobe in the inhibitor- 
bound structure (a rotation of around 15°). Note also that the inhibitory turnin 
the activation segment helix is replaced by a short helix in the PLX4720 
complex. h, Alternative view of the superposition shown ing, highlighting the 
axis of rotation (pink arrow) between the N-lobes.i, Asinh, but witha 
representative inhibitor-bound dimeric BRAF structure superimposed (PDB 
ID: SCSW, a dabrafenib complex). The rotation axes for N-lobe rotations of 
dimer structures are shownas green arrows. Note that the orientation of the 
rotation axis is similar for all of the dimer structures, but almost orthogonal to 
that of the monomer structureinh. Inbothhandi, the Ca atoms of K522 are 
shownas spheres asa point of reference.j, Relative N-lobe rotation of wild-type 
and BRAF(V600E) crystal structures available in the Protein Data Bank (PDB) 
are compared with the present nucleotide-bound, autoinhibited structure. As 
illustrated in h andi, C-lobes of the BRAF kinase domains were superimposed, 
and the rotation required to bring the kinase N-lobes into register were 
calculated using PyMOL. With the exception of 4MNE, all structures compared 
were determined in complex with inhibitors. 
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Extended Data Fig. 5 | Additional analysis of the crystal structure of the 
autoinhibited BRAF-MEK1 kinase domain complex. a, Crystallographic data 
collection and refinement statistics for the structure of the BRAF kinase 
domain (BRAF*®) in complex with MEK1(S218A/S222A), AMP-PNP and MEK 
inhibitor GDC-0623. Data were recorded froma single crystal. b, The crystal 
structure of the autoinhibited BRAF-MEK1 kinase domain complex is 
superimposed onthe corresponding region of the autoinhibited cryo-EM 
structure. c, ATP-analogue AMP-PNP is extensively coordinated inthe 
autoinhibited state. Hydrogen bonds from coordinating residues are indicated 
by dashed lines. d, MEK1 residue E102 in the B3-aC loop is positioned to forma 


hydrogen bond witha ribose hydroxyl of the nucleotide bound inthe BRAF 
active site. e, Rare but recurrent oncogenic mutations in MEK1 map tothe 
region of the N-terminal helix. A small, in-frame deletion of two residues (E102, 
1103) inthe B3-aC loop maps to the region of the interface between BRAF and 
MEK1.f, RASopathy mutations in BRAF illustrated in the inactive conformation 
of the kinase domain. As with oncogenic mutations in many of the same 
residues, RASopathy-associated mutations will perturb nucleotide binding 
and/or the stability of the inhibitory turn. Notably, residues E501 and T599 form 
a hydrogen bond (dashed line) that appears to contribute to the stability of the 
inhibitory turn. 
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Extended Data Fig. 7 | Purification and characterization of wild-type and 
BRAF(S365A) complexes. a, BRAF(S365A) was co-expressed with 
MEK1(S218A/S222A) in insect cells, purified by serial Ni-NTA agarose and 
StrepTrapHP affinity chromatography, and subjected to SEC on Superose 6 
column. The SEC elution trace is shown on the left with a Coomassie-stained 
SDS-PAGE gel of elution fractions on the right. A parallel gel was blotted with 
an antibody against pS729 (bottom right). BRAF activity in each fraction was 
measured ina MEK phosphorylation assay (top right; see Methods for assay 
details.). b,c, Side-by-side comparison of wild-type BRAF complexes isolated 


from insect cells without (b) and with (c) co-expression of MEK1(S218A/S222A). 


Complexes were purified by serial Ni-NTA agarose and StrepTrapHP affinity 
chromatography and subjected to SEC on Superose 6. The SEC elution traces 
are shown on the left with Coomassie-stained SDS-PAGE gels of elution 
fractions onthe right. BRAF activity in each fraction was measured ina MEK 
phosphorylation assay as described above (top right). Note that co-expression 
of MEK1 markedly decreases the void peak and enables the isolation of a late- 


eluting peak (around 15 ml) with little MEK-phosphorylation activity that 
corresponds to the autoinhibited BRAF—-MEK1-14-3-3 monomer complex 

(c, fractions B8-C3). d, Wild-type BRAF was expressed in mammalian HEK293 
cells, purified by serial Ni-NTA agarose and StrepTrapHP affinity 
chromatography, and subjected to SEC on Superdex 200. e, Elution fractions 
from the wild-type BRAF-14-3-3 SEC runind are analysed by SDS-PAGE and 
western blotting, revealing that BRAF co-purifies with endogenous human 14- 
3-3 proteins. Fractions were also blotted for total BRAF (anti-Strepll), pS365 
and pS729. f, Mass spectrometry analysis of trypsin and Lys-C protease digests 
of peak fractions of the BRAF-14-3-3 complex from HEK293 cells revealed 
multiple peptide sequences that mapped uniquely to six of the seven human 14- 
3-3 isoforms. The 6 and a isoforms are phosphorylation variants of Cand B, 
respectively. For gel source data, see Supplementary Fig. 1. SEC experiments 
were repeated at least three times (a-—e), activity assays twice (a) and once (b,c), 
and blotting twice (e) with similar results. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Cryo-EMimaging of dimeric, active-state BRAF 
complexes and mass-spectrometry-based measurement of 
phosphorylation stoichiometry in BRAF-14-3-3 and BRAF-MEK1-14-3-3 
complexes. a, Representative 2D class averages for the BRAF(S365A)-MEK1- 
14-3-3 complex. Scale bar, 10 nm. b, FSC curves for the BRAF(S365A)-MEK1-14- 
3-3 reconstructions presented in Fig. 4.c, SEC (Superose 6) traces for the 
indicated affinity-isolated BRAF complexes, prepared using insect or 
mammalian cells as described in Extended Data Fig. 7 and Methods. SEC 
experiments were repeated at least three times with similar results. d-f, Per 
cent phosphorylation of selected BRAF sites in successive elution fractions is 
plotted for each sample analysedinc. Fractional phosphorylation of these sites 
was measured using a mass-spectrometry-based assay (see Methods). d, Wild- 
type BRAF-14-3-3 complex produced by insect cells. e, Wild-type BRAF-14-3-3 
complex produced by mammalian cells. f, BRAF-MEK1-14-3-3 complex 
produced ininsect cells, prepared by co-expression of wild-type BRAF and 


MEK1(S218A/S222A). In d-f, note the high fractional phosphorylation of $729 
inallsamples, and the negligible phosphorylation of activation segment sites 
T599 and S602. g, Representative 2D-class averages for wild-type BRAF-14-3-3 
complexes prepared from mammalian cells with (top) and without (bottom) 
the addition of BRAF inhibitor GDC-0879 (1M). Both samples yielded class 
averages indicative of the same particle architecture, but those of the drug- 
treated sample revealed better-defined secondary structure. Scale bar, 10 nm. 
h, FSC curve for the wild-type BRAF-14-3-3 reconstruction. i, Single-particle 
reconstruction of the wild-type BRAF-14-3-3 complex produced in 
mammalian cells treated with GDC-0879. The reconstruction reveals a back-to- 
back BRAF kinase domain dimer witha 14-3-3 dimer bridging between its 
C-terminal pS729 tails. Comparison of these front and back views reveals the 
highly asymmetric position of the 14-3-3 dimer with respect to the dimerized 
kinase domains. Imaging experiments in a and g were repeated twice with 
independent preparations, and gave similar results. 


C) Structure 


Autoinhibited 
monomer 


Extended Data Fig. 9 | Structural snapshots outline a model for RAF 
activation. The RBD domain is exposed in the context of the autoinhibited 
BRAF-MEK1-14-3-3 monomer complex, enabling high-affinity binding to 
farnesylated, GTP-loaded RAS at the plasma membrane. We propose that 
‘extraction’ of the CRD domain upon binding to prenylated RAS at the 
membrane isa key step in RAF activation. Without the stabilizing interactions 
of the CRD domain, the 14-3-3 domain can release from the BRAF kinase domain 
and pS365 segment to forman ‘open’ monomer. We expect the RAF-MEK 
kinase module of the open monomer to maintain its inactive, ATP-bound 


Open 
monomer 


Active 
dimer 


conformation as observed inthe crystal structure described here. Finally, the 
14-3-3 domaincan rearrange to bind the C-terminal pS729 sites of two open RAF 
molecules, driving formation of the active, back-to-back RAF dimer. As 
illustrated here, the stoichiometry of 14-3-3 binding changes upon activation, 
but we do not exclude the possibility that a second 14-3-3 dimer remains 
associated with the complex, for example by bridging the pS365 segments. KD, 
RAF kinase domain; red circles (pSer) represent the pS365 and pS729 14-3-3 
binding segments. 
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Extended Data Table 1| Cryo-EM data collection, refinement and validation statistics 


BRAF-MEK1-14-3-3 BRAF(S365A)—MEK1-14-3-3 BRAF-14-3-3 
with two MEK with one MEK 
(EMD-0541) (EMD-20550) (EMD-20552) (EMD-20551) 
(PDB 6NYB) (PDB 6Q0J) (PDB 6Q0T) (PDB 6Q0k) 
Data collection and processing 
Magnification 130,000 x 36,000 x 36,000 x 105,000 x 
Voltage (kV) 300 200 200 300 
Electron exposure (e—/A?) ~50 ~50 ~50 ~T0 
Defocus range (um) -1.8 - -3.3 -2.0 - -3.0 -2.0 - -3.0 -1.7 --2.7 
Pixel size (A) 1.06 (2x binned) 1.11 1.11 1.7 (2x binned) 
Symmetry imposed C1 C1 C1 C1 
Initial particle images (no.) 3,531,955 2,008,323 2,008,323 365,083 
Final particle images (no.) 165,298 425,135 595,672 66,215 
Map resolution (A) 4.1 4.9 5.7 6.8 
0.143 FSC threshold 
Refinement 
Initial model used (PDB code) 5FD2, 4MNE, 6PP9, 4FJ3, 1FAR, 4MNE, 6NYB 4MNE, 6NYB 4MNE, 3NKX 


Map sharpening B factor (A?) 
Model composition 
Non-hydrogen atoms 
Protein residues 
Ligands 
Metals 
B factors (A’) 
Protein 
Ligand 
R.m.s. deviations 
Bond lengths (A) 
Bond angles (°) 
Validation 
MolProbity score 
Clashscore 
Poor rotamers (%) 
Ramachandran plot 
Favored (%) 
Allowed (%) 
Disallowed (%) 
Model vs Data 
CC (mask) 
CC (box) 
CC (peaks) 
CC (volume) 
Mean CC for ligands 


3NKX 
-225 


8811 
1097 
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Statistics 


For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section. 


n/a | Confirmed 


The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement 


A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly 


The statistical test(s) used AND whether they are one- or two-sided 
Only common tests should be described solely by name; describe more complex techniques in the Methods section. 


A description of all covariates tested 


A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons 


A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) 
— AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) 


For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted 
“1 Give P values as exact values whenever suitable. 


For Bayesian analysis, information on the choice of priors and Markov chain Monte Carlo settings 


For hierarchical and complex designs, identification of the appropriate level for tests and full reporting of outcomes 


Estimates of effect sizes (e.g. Cohen's d, Pearson's r), indicating how they were calculated 


Our web collection on statistics for biologists contains articles on many of the points above. 


Software and code 


Policy information about availability of computer code 


Data collection SerialEM for Krios microscope and K2, K3 detector 


Data analysis Standard widely available software was used for structure determination, including RELION version 3, crYOLO, Coot and PHENIX. 
References are provided in the Methods references. 


For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. 
We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information. 


Data 


Policy information about availability of data 
All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: 


- Accession codes, unique identifiers, or web links for publicly available datasets 
- A list of figures that have associated raw data 
- Adescription of any restrictions on data availability 


The cryo-EM MAP and atomic coordinates have been deposited with the EMDB and PDB, respectively (EMD-0541, EMD-20550, EMD-20552, EMD-20551, PDB: 
6NYB, 6Q0J, 6QOT, 6QOK, 6PP9). 
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Field-specific reporting 


Please select the one below that is the best fit for your research. If you are not sure, read the appropriate sections before making your selection. 


x] Life sciences Behavioural & social sciences Ecological, evolutionary & environmental sciences 


For a reference copy of the document with all sections, see nature.com/documents/nr-reporting-summary-flat.pdf 


Life sciences study design 


All studies must disclose on these points even when the disclosure is negative. 


Sample size ot applicable 
Data exclusions ot applicable 
Replication ot applicable 


Randomization ot applicable 


Blinding ot applicable 


Reporting for specific materials, systems and methods 


We require information from authors about some types of materials, experimental systems and methods used in many studies. Here, indicate whether each material, 
system or method listed is relevant to your study. If you are not sure if a list item applies to your research, read the appropriate section before selecting a response. 


Materials & experimental systems Methods 

n/a | Involved in the study n/a | Involved in the study 
Antibodies ChIP-seq 
Eukaryotic cell lines Flow cytometry 
Palaeontology MRI-based neuroimaging 


Animals and other organisms 


Human research participants 


Clinical data 
Antibodies 

Antibodies used Antibodies 
Vender 
Cat#/RRID 

ouse monoclonal Penta-His Antibody 

Qiagen 
34660 


RRID:AB_ 2619735 

Rabbit polyclonal BRAF antibody 
ThermoFisher Scientific 

PA5-14926 

RRID:AB_ 10975898 

Rabbit monoclonal Anti-BRAF (Phospho S729) 
abcam 

Ab124794 

RRID:AB_10976055 

Rabbit polyclonal anti Phospho-CRAF(S259) antibody 
Cell Signaling Technology 

9421S 

RRID:AB_ 330759 

Rabbit polyclonal Anti-Strep-tag I! 

abcam 

Ab76949 

RRID:AB_ 1524455 

ouse monoclonal DYKDDDDK tag (9A3) 

Cell Signaling Technology 

8146S 
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RID:AB_ 10950495 
abbit polyclonal Anti 14-3-3 Beta/alpha 
Signaling Technology 


nD Oo 
Ww 
fon) 
n 


RID:AB_560823 
abbit monoclonal Anti 14-3-3 zeta/delta(D7H5) 
Signaling Technology 

3S 
RID:AB_10950820 
abbit monoclonal Anti 14-3-3 gamma (D15B7) 
Signaling Technology 
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RID:AB_ 10827887 
abbit polyclonal Anti 14-3-3 epsilon 
Signaling Technology 

5S 
D:AB_ 2217758 

abbit monoclonal Anti 14-3-3 eta (D23B7) 
Signaling Technology 

1S 
RID:AB_ 10829034 
abbit polyclonal Anti 14-3-3 tau 
Signaling Technology 

8S 
RID:AB_ 2218251 
abbit polyclonal Anti 14-3-3 (pan) 
Signaling Technology 

T#8321S 
RID:AB_ 10860606 
abbit polyclonal, Anti-phospho —MEK1/2 (S217/221) antibody 
Signaling Technology 

#91215; 
RID: AB_ 330745 
abbit polyclonal, Anti- MEK1/2 antibody 
Signaling Technology 

#91225 ; 
ID:AB_ 823567 
ti-mouselgG, HRP-linked secondary antibody 
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Signaling Technology 

#7076s 

ID: AB_ 330924 

L Donkey anti-Rabbit IgG, HRP-linked secondary antibody 
GE Healthcare 


maa 
QOD 


Cat#NA934V; RRID:AB_772191 


Validation Listed RRID for each antibody. 


Eukaryotic cell lines 


Policy information about cell lines 


Cell line source(s) HEK293 (Expi293F) cells from Thermo-Fisher Scientific 
Authentication Expi293F cells from were obtained directly from Thermo-Fisher Scientific, and were used only for protein production. 
Mycoplasma contamination negative 


Commonly misidentified lines n/a 
(See ICLAC register) 


Corrections & amendments 


Author Correction: 
Exome sequencing of 
Finnishisolates enhances 
rare-variant association 
power 


https://doi.org/10.1038/s41586-019-1726-x 


Correction to: Nature https://doi.org/10.1038/s41586-019-1457-z 


Published online 31 July 2019 


Adam E. Locke, Karyn Meltz Steinberg, Charleston W. K. Chiang, 
Susan K. Service, Aki S. Havulinna, Laurel Stell, Matti Pirinen, 
Haley J. Abel, Colby C. Chiang, Robert S. Fulton, Anne U. Jackson, 
Chul Joo Kang, Krishna L. Kanchi, Daniel C. Koboldt, David E. Larson, 
Joanne Nelson, Thomas J. Nicholas, Arto Pietila, Vasily Ramensky, 
Debashree Ray, Laura J. Scott, Heather M. Stringham, 

Jagadish Vangipurapu, Ryan Welch, Pranav Yajnik, Xianyong Yin, 
Johan G. Eriksson, Mika Ala-Korpela, Marjo-Riitta Jarvelin, 

Minna Mannikk6, Hannele Laivuori, FinnGen Project, 

Susan K. Dutcher, Nathan O. Stitziel, Richard K. Wilson, Ira M. Hall, 
Chiara Sabatti, Aarno Palotie, Veikko Salomaa, Markku Laakso, 
Samuli Ripatti, Michael Boehnke & Nelson B. Freimer 


In this Article, several errors have been drawn to our attention. In the 
author list, some of the affiliation numbers were incorrect (see PDF ver- 
sion of the original Article for affiliation numbering): Robert S. Fulton’s 
affiliations should be 2,13 rather than 2; Hannele Laivuori’s affiliations 
should be 8,35,36 rather than 7,35,36; Chiara Sabatti’s affiliations should 
be 10,39 rather than 9,39; Aarno Palotie’s affiliations should be 8,40,41 
rather than 7,40,41; and Samuli Ripatti’s affiliations should be 8,11,41 
rather than 7,11,41. In addition, the faculty of affiliation 36 should read 
‘Faculty of Medicine and Health Technology’ rather than ‘Faculty of 
Medicine and Life Sciences’. In the main text on page 326, the protein 
consequence for the associated KRT40 variant, listed as ‘Ser32Pro’, 
should be ‘Ser328Pro’. Inthe legend for Extended Data Fig. 7, the defini- 
tions of SuK and Nfi were missing and have been added. To clarify our 
description of the Finnish map in Fig. 3, the last line of the legend should 
read ‘Birthplaces of carrier and non-carrier individuals were plotted 
onamap of Finland, including regions of Finland, later ceded, as they 
existed before the Second World War. rather than ‘Birthplaces of carrier 
and non-carrier individuals were plotted ona map of Finland, including 
regions that were ceded before the Second World War.. In the ‘Apoli- 
poprotein B’ section of the Supplementary Information, the protein 
consequence for the associated AP1M2 variant, listed as ‘418Asn’, should 
be ‘Tyr418Asn’. In the ‘Multiplicity adjustment procedure’ section of 
the Supplementary Information, we corrected the annotation in step 
Il. (a) to correctly identify the ordered P values (m) for all 64 traits at 
each variant. These errors have all been corrected online. 
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Corrections & amendments 


Author Correction: 
LKB1 loss links serine 
metabolism to DNA 
methylation and 
tumorigenesis 


https://doi.org/10.1038/s41586-019-1696-z 


Correction to: Nature https://doi.org/10.1038/nature20132 


Published online 31 October 2016 


Filippos Kottakis, Brandon N. Nicolay, Ahlima Roumane, 
Rahul Karnik, Hongcang Gu, Julia M. Nagle, Myriam Boukhali, 
Michele C. Hayward, Yvonne Y. Li, Ting Chen, Marc Liesa, 
Peter S. Hammerman, Kwok Kin Wong, D. Neil Hayes, 

Orian S. Shirihai, Nicholas J. Dyson, Wilhelm Haas, 
Alexander Meissner & Nabeel Bardeesy 


In this Article, there were several errors, as follows. The ‘Gene 
Expression Profiling’ section of the Supplementary Methods should 
have included additional information about differential expression 
analysis, gene set enrichment analysis (GSEA) analysis and the iden- 
tification of untranslated regions (UTRs). The corrected paragraph is 
as follows: ‘RNA-sequencing was performed using total RNA isolated 
from two independent cell lines from the K (three replicates total) or 
KL (four replicates total) genotypes or from the two independent KL 
lines transduced with full-length LKBI cDNA (rescue, two replicates). 
RNAseg library-preparation and sequencing were performed by the 
Tufts University Genomics Core Facility. Data were processed using 
a standard RNA-seq pipeline that used Tophat2’ to align the reads 
to mm49, and the Cufflinks suite” to calculate expression values and 
differential expression. Gene Set Enrichment Analysis (GSEA) (http:// 
www.broadinstitute.org/gsea/index.jsp) of the expression data was 
used to assess enrichment of the KEGGas well as the SGOC geneset™ ™. 
There were 2,520 differentially regulated autosomal genes between K 
and KL samples based on q-value as reported by the cufflinks suite (see 
Supplementary Table 1 of this Amendment). This list was uploaded to 
the UCSC Genome Browser to extract promoter, intron, exonandUTR 
sequences. 2,443 of the total 2,520 genes were identified by the algo- 
rithm and were associated with 4,706 UTRs. Inall cases, pairwise GSEA 
was performed by creating lists of genes using the FPKM value reported 
by cufflinks of K to KL or KL to rescue and P values were obtained by 
permuting the gene set (1,000 permutations). To calculate the statis- 
tical significance of SGOC pathway enrichment, the SGOC genelist” 
was added to the KEGG signature list and GSEA was performed using 
this modified KEGG signature list. Raw sequencing files can be found 
under the Superseries record GSE86145 (http://www.ncbi.nIm.nih.gov/ 
geo/query/acc.cgi?acc=GSE86145). The Supplementary Information 
to this Amendment contains the Supplementary Table 1 cited above. 

In addition, the following sentence should have been included at 
the end of the ‘Liquid Chromatography Mass Spectrometry’ section 
of the Supplementary Methods: ‘Proteomics data were uploaded to 
https://massive.ucsd.edu/ and can be found under the accession num- 
ber MSVO00082186. 


In the ‘Materials’ section of the Supplementary Methods, the text 
‘PSAT1 (sc-133929) from Santa Cruz’ should have been added after the 
information on the 5-hydroxymethycytosine antibody. 

The following information should have been provided at the end of 
the ‘SDS-PAGE Analysis’ section of the Supplementary Methods: ‘For 
Extended Data Fig. 1k (GLUTI, actin), Extended Data Fig. 2e (LKB1, GLDC, 
PSATI, actin), Extended Data Fig. 5c (DNMT1, DNMT3A, actin), Extended 
Data Fig. Sh (H3K36me3, H3K27me3, H3K4me3, H3), Extended Data 
Fig. 7f (LKB1, total AMPK, pAMPK (T172), actin), Extended Data Fig. 7g 
(pACC (S79), total ACC, actin) and Extended Data Fig. 7w (p-p70S6K 
(T389), p70S6K, actin), samples were derived from each corresponding 
experiment and blots were processed in parallel. For Extended Data 
Fig. 7f, the actin membrane was stripped and reprobed for total ACC 
and the pAMPK membrane wasstripped and reprobed for pACC (S79). 

Knockdown efficiencies for shiRNAs against AMPKal and AMPKa2, 
and against DNMT1and DNMT3A (as used in experiments in Extended 
Data Figs. 7 and 9, respectively) were not providedin the original Article, 
and are now included as Supplementary Fig. 1in the Supplementary 
Information to this Amendment. 

In Extended Data Fig. 7f, the blot for AMPKa was inadvertently verti- 
cally flipped. See Fig. 1 of this Amendment for a corrected version of the 
panel. Inthe top middle (total AMPK) and bottom left (pDAMPK (T172)) 
blots of the panel for Extended Data Fig. 7fin Supplementary Data Fig. 
lof the original Article, the molecular mass markers for total AMPK and 
pAMPK were mislabelled, and the bottom three markers should have 
been ‘25’, ‘37’ and ‘50’ kDa instead of ‘37’, ‘SO’ and ‘75’ kDa. 

In the SGOC network diagram in Extended Data Fig. 4a, the AHCY 
enzyme should have been shown catalysing the reaction from SAH to 
HCY instead of the reaction from HCY to Met. 

Inthe legend to Extended Data Figs. 4g and 5a, the sentence: ‘The data 
plotted are expressed as mean-centred values’ should have stated: ‘The 
data plotted are expressed using a min-to-max relative colour scheme’. 

In the legends to Extended Data Figs. 1k, 2e and 5c, the text: ‘Actin 
was used as the loading control’ should have stated: ‘For western blot 
analyses, samples were derived from the same experiment and blots 
were processed in parallel. Actin was used as the sample processing 
control.. Similarly, the legend to Extended Data Fig. 7f, gand wshould 
have included the text: ‘Samples were derived from the same experi- 
ment and blots were processed in parallel. Actin was used as the sample 
processing control. 

The legend to Extended Data Fig. 5h should have included the text: 
‘Samples for H3K36me3, H3K27me3 and H3 were derived from the 
same experiment. Samples for H3K4me3 and H3 were derived fromthe 
same experiment. Blots were processed in parallel and H3 was used as 
the sample processing control... 

The Supplementary Information of this Amendment contains Sup- 
plementary Table 1 and Supplementary Fig. 1, as described above. The 
original Article has not been corrected online. 


Supplementary information is available in the online version of this 
Amendment. 
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Fig. 1| Thisis the original, incorrect published Extended Data Fig. 7f, andthe 
corrected Extended Data Fig. 7f. The blot for AMPKa was inadvertently 
flipped vertically in the published figure. 
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Graduate students: 
the tortuous truth 
go.nature.com/ 
phdsurvey2019 


JUST AMINUTE... PHD STUDENTS 
VOICE CONCERNS ON MENTORING 


In this second article to mark Nature’s 2019 graduate survey, respondents call 
for more one-to-one support and better career guidance. By Chris Woolston 


hen Peter Butler started his PhD 

programme in physics at the 

University of Bristol, UK, he saw 

himself spending many hours at 

a whiteboard working on prob- 
lems, with his supervisor by his side. Those 
long hours of togetherness never material- 
ized. In that sense, he says, “I didn’t get what 
I expected.” However, he adds that his super- 
visor gave him plenty of good strategic advice 
and helped him to get published. And having 
toturnto other people for support was useful, 
he adds. “I had to act like a scientist.” 

Butler was one of more than 6,300 graduate 
students worldwide who responded to Nature’s 
fifth biennial PhD survey. These students had 
much to say about the state of mentorship at 
their institutions and in the scientific commu- 
nity. Their answers and free-text comments 


made clear that they often aren’t getting what 
they expect, or need, from their supervisors. 
The full data set is available at go.nature. 
com/2nqjndw. One telling statistic was that 
nearly one in four said they would change their 
supervisor if they could start their programme 
again; the 2017 figure was similar. 

The survey — created with Shift Learning, a 
London-based market-research company — had 
its bright spots. Overall, 67% of respondents 
said they were satisfied with their relationship 
with their supervisors, with 41% of those in 
Africaand South America saying they were very 
satisfied. Some are especially grateful. “When 
I started my PhD, I didn’t know about all of the 
possibilities,” says Marina Kovaéevic, a PhD 
studentin physical chemistry at the University 
of NoviSad in Serbia. Now, she hopes to run her 
own laboratory, a goal that her co-supervisors 


© 2019 Springer Nature Limited. All rights reserved. 


encourage by letting her help to write proposals 
and take on other tasks of a lab leader. ”She is 
truly one of the most devoted PhD students,” 
says one supervisor, Branislav Jovic. 

But roughly one-fifth of respondents said 
that they were dissatisfied with their super- 
visor relationship, a disconnect that threatens 
their future as well as their present. “Students 
whoare effectively mentored outperform those 
who aren't,” says Ruth Gotian, assistant dean 
for mentoring at Weill Cornell Medical College 
in New York City. A coming report from the US 
National Academies of Sciences, Engineering, 
and Medicine notes that positive mentorship 
is the “most important factor in completing 
a STEM [science, technology, engineering or 
mathematics] degree”. The report also cites 
studies showing that effectively mentored 
students are more likely to publish papers, 
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and morelikely to finish their PhD programme. 

Luckily for students, mentorship needn't 
bea one-person job. The survey results under- 
score theimportance of networks that canfillin 
gaps when asupervisor falls short, says Emma 
Williams, an author and career coach and 
founder of EJW Solutions, a scientist-advisory 
company in Cambridge, UK. “PhD students 
should be encouraged right from the start to 
havea variety of mentors,” says Williams, who 
earned her degree in medical physics from the 
University of Cambridge. 


No time for career advice 


Many graduate students have discovered that 
not all mentors can devote much time to the 
job. In the survey, 49% of students reported 
spending less than an hour one-to-one with 
their supervisor each week (see ‘Brief encoun- 
ters’). “That’s ashocking figure,’ Williams says. 
Although some students can probably thrive on 
that amount, or oneven less, most could benefit 
from more direct guidance and attention, she 
says. She speaks from personal experience; her 
own highly accomplished PhD adviser didn’t 
have the time to builda strong connection. “He 
called me by the wrong name in the middle of 
my PhD,’ she says. “That was a low point.” 

Job prospects are a persistent worry for PhD 
students, but they can’t always count on their 
supervisors to show them the way forward. In 
the survey, just one-third of respondents said 
that they were satisfied with the career guid- 
ance they received from their mentors and 
others in their PhD programme, down from 
40% in the 2017 survey. When asked how they 
arrived at their current career decision, just 
28% credited advice from their supervisor, 
down from 34% in the survey two years ago. 

Notably, 60% of respondents said that 
they based their career decision on their own 
research of the topic. Unfortunately, students 
whotry the do-it-yourself approach probably 
won't be aware of all of their options, Williams 
says.” They’re only going to google the things 
that they already have in mind,” she adds. 

Many advisers seem too preoccupied with 


BRIEF ENCOUNTERS 


Interactions with a supervisor can be a crucial 
part of PhD training, but some students get much 
more individual time than others. 


Q: On average, how much one-to-one time do 
you spend with your supervisor each week? 
Less than 


one hour 
49% 


Not specified 2% t————_, 


More than 
three hours 
12% 


6,320 


RESPONDENTS 


Between one wa 
and three hours ~ 
35% 


*Percentages do 
not add up to 100 
because of rounding. 
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BULLYING FROM THE TOP 


A substantial number of PhD students feel bullied in their programmes. Speaking 
out is difficult — partly because supervisors tend to be the chief culprits. 


Q: Do you feel that you have experienced 
bullying in your PhD programme?* 


Prefer not Yes 
to say 21% 
4% 


6,313 


RESPONDENTS 


Other academic staff 


Q: Who was the perpetrator(s)? 


Supervisor 


Another student 


38% 


33% 
Postdoc 20% 
Prefer not to say A% 


Online troll 4 1% 


Q: Do you feel that you can speak out about your 
experiences of bullying without repercussions?* 


No ———— 
74% 


No Yes Unsure 
20% 


*Percentages do not add up to 100 because of rounding. 


their own science to offer careers advice, says 
Nick Valverde, a PhD student in physics at the 
US National Superconducting Cyclotron Lab- 
oratory, located on the campus of Michigan 
State University in East Lansing. “It’s almost 
impossible to find someone who knows about 
career trends,” he says. “Mentors have alot on 
their plates, and trends change.” Guidance for 
careers in industry can be especially hard to 
come by. Only 28% of respondents said that 
they had received useful advice for pursuing 
acareer outside academia. 


Unready for duty 


Part of the problem, Gotian says, is that 
mentors who have spent their entire careers 
in academia might not think much about other 
career paths. “Very often, mentors will try to 
create ‘mini-mes’, another version of them- 
selves,” she says. But if mentors took off the 
academic blinkers, they could boost their 
students’ career prospects without much 
effort, she adds. “They may not have much 
knowledge of industry, but they probably 
have contacts that they could connect their 
students with. That doesn’t happen as often 
as it should.” 

A further problem is that mentors don’t 
necessarily receive much training in people 
management, a shortcoming that can con- 
tribute to especially dark consequences. In 
the survey, 21% of respondents reported expe- 
riencing discrimination or harassment. The 
same percentage also reported bullying. Of 
those, nearly half said that their supervisor 
was the perpetrator (see ‘Bullying from the 
top’). “Ina results-driven culture, you're very 
dependent on people senior to you to move 
on with your career,” Williams says. “It’s very 
fertile ground for bullying and harassment.” 

These numbers once again reinforce the 
need to have more than just one person ona 
student’s side, Williams says. “One of my clients 
at a prominent university was being bullied,” 
she says. “Finding someone else that she could 
use as a Sounding board really helped her.” 

With so muchat stake, choosing a mentor 
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or amentorship team can be one of the most 
important tasks a graduate student can face. 
Kovaéevic says that she peppered potential 
advisers with questions before joining her 
current lab. “I thought it was my right to ask 
anything,” she says. “And I thought it was their 
job to answer.” 

But not all students have that option. “I 
had no say in choosing my PhD adviser,” says 
Samhita Krishnaswamy, a PhD student in 
psychology at Jain University in Bengaluru, 
India. She says that she felt inspired by an 
accomplished professor in her programme. 
But he was not her adviser, and she rarely had 
achance to speak to him in person. She feels 
that supervisors, in general, could be better 
prepared to guide their students. “In India, 
supervisors need more in-depth skill sets,” 
she says. “They’re mostly looking at furthering 
their owncareers. They are very uncomfortable 
pursuing topics outside of their research area.” 

Evenso, Krishnaswamy says that she’s happy 
with her overall training. She’s had the flexi- 
bility to study multiple topics in psychology, 
including the psychology of Indigenous popu- 
lations of India. “Ihave everything I need here,” 
she says. “It’s given me a foundation to be an 
independent researcher.” 

In his third year at the National Super- 
conducting Cyclotron Laboratory, Valverde 
says that he’s building a foundation, too. But 
it wasn’t easy. At first, he was intimidated by 
the experience and knowledge of Cyclotron 
researchers.“You’re working with someone who 
has 40 years under their belt,” he says. “They’re 
talking about particles and symmetry, and I’m 
like, man, | know about tension ina rope.” 

Valverde managed to bridge some of those 
gaps in his knowledge and form real connec- 
tions with researchers at the lab — because 
he had to. Ultimately, he says, science is too 
challenging to tackle without help. “It could 
be crippling if you tried to doit alone,” he says. 
“That’s where a mentor comes in.” 


Chris Woolston is a freelance writer in Billings, 
Montana. 
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FIXING GENOME ERRORS 
ONE BASE AT A TIME 


Genetic base editors can efficiently correct point mutations in cell 
lines, animal models and perhaps the clinic. By Sandeep Ravindran 


hen Xingxu Huang began thinking 
about correcting disease-causing 
mutations inthe human genome, 
his attention turned naturally 
to CRISPR-Cas9. But it quickly 
became clear that the popular gene-editing 
tool wasn’t ideal for the majority of human dis- 
ease mutations, which result from errors in 
single DNA nucleotides known as point muta- 
tions. More than 31,000 such mutations inthe 
human genome are known to be associated 
with human genetic diseases. But CRISPR is 
not particularly efficient at correcting them. 

Then Huang learnt about base editors, anew 
class of genome-modifying proteins that excel 
at single-site mutations. 

Base editors chemically change one DNA 
base to another without completely breaking 


the DNA backbone. The first cytosine base 
editor (CBE), which chemically converts a 
cytosine-guanine (C-G) base pair into a thy- 
mine-adenine (T-A) base pair at a targeted 
genomic location, was developed in 2016 
by chemical biologists David Liu and Alexis 
Komor at Harvard University in Cambridge, 
Massachusetts". Another researcher in Liu’s 
laboratory, Nicole Gaudelli, developed the 
first adenine base editor (ABE) a year later’; it 
chemically transforms A-T to G-C base pairs. 
“Base editing gives very, very good effi- 
ciency, about 40-50% efficiency for cell lines,” 
says Huang, a geneticist at ShanghaiTech Uni- 
versity in China. “That’s very high efficiency 
compared with traditional genome editing,” 
which is only one-tenth as efficient, he says. 
But base editors are not just more efficient 
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than CRISPR-Cas9; they also cause fewer 
errors. CRISPR-Cas9 acts as molecular scis- 
sors that cut both strands of DNA. As the 
cell repairs the break, random bases can be 
inserted or deleted (indels), altering the gene 
sequence. Large chromosomal segments 
might even be deleted or rearranged. By alter- 
ing just a specific nucleotide without making 
double-stranded breaks, base editors cause 
fewer unwanted mistakes. 

Researchers have applied these tools 
across the evolutionary tree, from bacteria 
and yeast to rice, wheat, zebrafish, mice, 
rabbits and monkeys. They have used them 
to knock out genes, and to create and correct 
animal models. They have applied them in 
very early human embryos in the laboratory. 
And they might one day use base editors to 
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treat human genetic diseases. 

First, however, researchers have to over- 
come some key hurdles. Like CRISPR-Cas9, 
base editors sometimes edit sites other 
than their target. They are limited in which 
genomic regions they can edit and what base 
conversions they can perform. And ifthey are 
ever to be used in the clinic, researchers will 
have to get better at delivering them into 
tissues. 

Butimproved editors are being developed at 
arapid rate. “It’s really atestament to how fast 
researchers have made progress in the field 
that we now have dozens of base editors that 
offer expanded targeting scope, improved DNA 
specificity and reduced off-target activity,’ says 
Liu. His base-editor constructs have been sent 
out to morethan1,000 laboratories around the 
world, he says, and new papers that use these 
and related tools appear almost weekly. 


Building an editor 

To create the first base editor, Komor took 
advantage of a naturally occurring enzyme 
called APOBEC1. This enzyme, which is part 
of the cytidine deaminase family, chemically 
converts C to uracil (U), ananalogue of T that 
occurs in RNA. Komor fused rat APOBEC1 to 
acatalytically impaired Cas9 nuclease that is 
unable to create DNA double-strand breaks. 
When a guide RNA directs the APOBEC1-Cas9 
fusion protein to a target site, the deaminase 
converts C to U. The cell’s DNA-repair system 
then fixes the resulting U-G mismatch by con- 
verting it into a U-A base pair, and ultimately 
toa T-A pair. 

Additional refinements improved the 
protein’s efficiency: these included swap- 
ping Cas9 for a Cas9 ‘nickase’ that cuts the 
G-containing strand, thus nudging the cell to 
replace the G rather than the U when repairing 
the U-G mismatch. “That extra modification 
boosted our efficiencies up to levels that we 
were happy with,” says Komor, who is now 
at the University of California, San Diego. 
Dubbed BE3, the resulting protein edits 
cellular DNA with almost a tenfold higher 
efficiency than CRISPR-Cas9 and with less 
than 1% indel formation. 

The first ABE was tougher to crack. No known 
naturally occurring enzymes could chemically 
convert A to Gin DNA. “It was a pretty big ask 
to create an enzyme that didn’t exist and have 
it work very well,” Gaudelli says. Luckily for 
her, Liu’s lab had expertise in using microbes 
to achieve the rapid directed evolution of 
proteins. Over seven rounds of evolution and 
protein engineering, Gaudelli gradually coaxed 
abacterial enzyme called TadA, which converts 
AtoGinsome RNAs, toaccepta DNA substrate 
and work better in mammalian cells, producing 
an editor called ABE7.10. 

Although they can effect only a subset 
of possible nucleotide changes, such 
enzymes can already address the majority of 
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disease-causing point mutations in humans, 
at least in theory. “The adenine base editor, 
in particular, corrects the most common kind 
of point mutation in humans,’ says Liu, refer- 
ring to G-C to A-T mutations, which account 
for about half of all known pathogenic single- 
nucleotide changes. For the moment, how- 
ever, the technology is for laboratory use only. 


Correcting and creating mutations 


Ininitial studies, Liu’s team showed that CBEs 
could correct point mutations associated with 
Alzheimer’s disease and cancer'in mouse and 
human cell lines with an on-target editing effi- 
ciency of 35-75% and a5% indel rate, compared 
with CRISPR-Cas9’s 0.1-0.3% efficiency and 
26-40% rate of indel formation. Using ABEs, 
Liu’s team corrected point mutations respon- 
sible for a life-threatening blood-cell disorder 
called hereditary haemochromatosis, as well 
as sickle-cell anaemia’. 

Researchers have used base editors to 
create and correct animal models of human 
diseases, including Duchenne muscular dys- 
trophy? ®, progeria® and age-related macular 
degeneration (H. Yang, unpublished observa- 
tions). “With base editors, it’s easy to create an 
animal model and explore pathogenic muta- 
tions all over the genome,” says Huang, who 
has generated mouse models of diseases such 
as androgen insensitivity syndrome and syn- 
dactyly, a condition in which multiple fingers 
or toes are fused together®. Huang was even 
able to combine CBEs and ABEs in the same 
mouse embryos, resulting in simultaneous 
A-G and C-T edits, a trick he achieved using 
editors with different sequence preferences. 
“We can handle several mutations simultane- 


“If you are familiar 

with genome-editing 
technology, youare 
ready to do base editing.” 


ously and with very high efficiencies,” he says. 

Base editors can also be used to produce 
gene knockouts. The CRISPR-Cas9 system 
is particularly adept at creating knockouts, 
thanks to the natural mechanism most 
commonly used to repair double-strand DNA 
breaks. That process can add or delete bases 
at the cut site, causing the gene sequence to 
be misread and causing protein synthesis to 
stop prematurely. But CBEs can convert cer- 
tain codons — the three-base genetic words 
that define the sequence of amino acids in 
a protein — toa stop signal directly, an idea 
that researchers are exploiting to systemati- 
cally test the effects of knocking out different 
genes across the genome’®. As base editors 
progress towards clinical trials, research- 
ers have begun testing them in non-human 
primates. In unpublished work, Hui Yang, a 
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developmental biologist at the Chinese Acad- 
emy of Sciences in Shanghai, has applied base 
editors in mouse and monkey models of eye 
diseases, such as age-related macular degen- 
eration, as well as Duchenne muscular dystro- 
phy and Parkinson’s disease. “Base editors just 
cause single-strand breaks, not double-strand 
breaks, so I really think it’s more safe than 
CRISPR,” says Yang. 

Base editors could also be used to create 
high-yield or disease-resistant plant varieties, 
says Caixia Gao, a plant biologist at the Chinese 
Academy of Sciences in Beijing. “A single nucle- 
otide change can make some rice plants better 
use nitrogenin the field, for example,” she says. 


Building a better editor 


Although theoretically similar to a genetic 
search-and-replace tool, base editors are in 
practice less precise. 

The fact that base editing uses Cas9 for 
sequence targeting means that it can produce 
off-target changes, just as CRISPR-Cas9 does. 
But base-editor specificity is complicated fur- 
ther by the deaminases that actually alter the 
DNA. These enzymes can modify RNA and 
single-stranded DNA at sites other than the 
intended target? “. “We don’t know if these 
effects will be clinically relevant or not, butit’s 
wise to try to minimize any unwanted editing,” 
says Liu. 

ABEs apparently show no such off-target 
effects. This is probably because the ABE 
deaminase binds more weakly to its target than 
does the CBE deaminase, and so needs Cas9’s 
help for efficient editing, says Liu. Researchers 
have now developed higher-fidelity CBEs, such 
as HF-BE3, with weaker target binding, and 
found that they have correspondingly lower 
levels of off-target editing”. 

Base editors can also sometimes edit 
‘bystander’ Cs or As that lie within their ‘editing 
window — the nucleotide region within which 
the enzyme works efficiently. Researchers 
have created editors with narrower or broader 
windows to enhance or reduce such effects. 
For instance, YE1-BE3 and YEE-BE3 are mod- 
ified versions of BE3 with narrower activity 
windows®, whereas ABE7.9 (ref. 2) and the CBE 
BE-PLUS™ have wider ones. 

“If we think about genetic disease 
correction, we need to have very specific 
editing, where we need to have this activity 
window be very narrow, downto one nucleo- 
tide,” says Gao. But an expanded editing 
window could be useful for accessing multiple 
target sites, for instance to introduce several 
point mutations at once. 

Base editors are also relatively limited in 
terms of the genomic sites that they can target; 
they can only act near a protospacer adjacent 
motif (PAM), the short DNA sequence required 
for successful binding of Cas9 to a DNA tar- 
get. Because of that restriction, “I believe 
only about 25% of the pathogenic mutations 
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inthe human genome can be precisely edited 
or corrected using current tools”, says Huang. 
Researchers have expanded base editors’ 
scope by using directed evolution to create 
Cas9 proteins that recognize a broader range 
of PAMs, and by fusing base editors to Cas9 
variants with wider PAM compatibility. 

And then there is the issue of the limited 
range of base changes that editors can cur- 
rently produce. To correct as many genetic 
diseases as possible, base editors will need 
to perform additional conversions, such as 
CtoA,CtoG,AtoCandAtoT.Jin-Soo Kim,a 
biochemist at the Institute for Basic Science 
in Daejeon, demonstrated this year that 
ABEs can achieve C-to-G conversion as well 
as C-to-T and A-to-G conversions ina human 
kidney cell line». “These results give us a hint 
on howto make other types of base editors,” 
he says. 

Alternatively, researchers could use a new 
class of genome editors from Liu’s lab, called 
prime editors, which can change any DNA base 
into any other’®. Prime editors use a special 
guide RNA template and Cas9 nickase to direct 
areverse transcriptase enzyme toa target site. 
There, the enzyme makes a new DNA strand 
from the RNA template and inserts it at the 
target (see ‘Prime corrective’). But there area 
lot of unknowns with these tools, “including 
whether we can successfully do prime editing 
in animals and whether it will be as generaliz- 
able for many different types of cells as base 
editing”, says Liu. 

With all these different options, research- 
ers will need to consider their needs care- 
fully to find the best fit for their project. For 
efficiently disrupting genes or inserting or 
replacing large DNA sequences, CRISPR-Cas9 
is the best bet, says Liu. It has been well stud- 
ied, has lots of variants with greater specific- 
ity or particular PAM affinities, and is already 
being tested in clinical trials. Prime editors 
offer the greatest flexibility for creating 
DNA insertions, deletions, point mutations 
or combinations thereof. And base editors 
are ideal for correcting point mutations, pro- 
viding higher efficiency and causing fewer 
indels. 

“I think all three of these classes of 
genome-editing agents really have comple- 
mentary strengths and weaknesses,” says 
Liu. He likens CRISPR-Cas$9 to scissors, base 
editors to pencils, and prime editors to word 
processors. “I think they all have their own 
roles in research and in applications such as 
agriculture and human therapeutics, just as 
scissors, pencils and word processors all have 
their own useful and unique roles.” 


As easy as CRISPR 


And just like scissors, pencils and word 
processors, base editing has been rapidly 
adopted by the scientific community, a tes- 
tament to its low barrier to entry. “If you are 


PRIME CORRECTIVE 


David Liu's prime editing strategy uses an RNA 
template and the enzyme reverse transcriptase (RT) 
to write genomic changes into the DNA. 


Genomic DNA 


Nick genomic 
DNA 


RT template including 


edit (orange) 


TW 


Primer- 
binding site 


Reverse 
transcription 


DNA copy of the prime editing 
guide RNA is incorporated into 
the target site 


Subsequent enzymatic steps 
repair the DNA and ensure the 
edit is present on both strands. 


familiar with genome-editing technology, 
I think you are ready to do base editing,” says 
Kim. 

Researchers can order base editors fromthe 
non-profit plasmid repository Addgene. Liu 
recommends starting with some of the newest 
editors, suchas his lab’s BE4Max and ABEMax, 
which target C and A, respectively. But many 
others could also fit the bill, he adds, depend- 
ing onthe circumstances. (See Table 1in ref. 17 
for a good starting point.) 

Consider PAM specificity and the editing 
window required to access the target, Liu says. 
Consider also how much to prioritize reduced 
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bystander editing or off-target effects. Spe- 
cialized computational tools such as beditor 
can help researchers to design guide RNAs for 
their particular target. 

Still, base editors don’t always work as 
expected. “Sometimes we have to test a cou- 
ple of different editors before we find one that 
likes our target,’ says Komor. Ifnothing works, 
researchers can cut and paste from different 
base editors to makea custom editor, a process 
that Komor says is relatively straightforward. 
“Don’t be afraid to make your own.” 

Whatever the editor, delivering them to 
cells requires standard genetic techniques, 
such as transfection, micro-injection and 
electroporation. “You can deliver them as 
protein-RNA complexes, as mRNA oras DNA,” 
says Liu. Therapeutic applications, however, 
will require a different approach. 

Conventional viral delivery vectors, such as 
adeno-associated virus (AAV), carry only lim- 
ited genetic cargo, and base editors are typically 
too large to fit. “Our current work is aimed at 
decreasing the size of the Cas9 and base editor, 
which think will broaden its application,” says 
Yang. Alternatively, researchers can split base 
editors across two vectors, as Kim didtotargeta 
mutation inthe Duchenne muscular dystrophy 
gene in adult mice. “We were able tocorrect the 
mutation in skeletal muscle,” he says. 

It is early days, but base editors have 
already become a promising addition to the 
genome-editing toolset. And they might 
have more tricks up their sleeves. Some edi- 
tors, for instance, can act on RNA rather than 
DNA, opening up the possibility of knocking 
down or editing mRNA transcripts containing 
pathogenic mutations. Base editors might also 
be able to target mutations in mitochondria, 
which lack the DNA-repair pathways that con- 
ventional genome editing relies on, says Kim. 

For Gaudelli, such opportunities represent 
the realization of a lifelong dream. “My moti- 
vation for being in the sciences was to make 
a difference in the world,” she says. “I never 
thought it would be through base editing.” 


Sandeep Ravindran is a science writer based 
in Washington DC. 
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nmy lab, we study the evolution of 
underwater landscapes. We look at 
density currents that plunge into an 
ocean or reservoir, and how sediment and 
water interact to shape the evolution of 
deltas, channels and canyons. These kinds of 
powerful currents occur in extreme events — 
suchas floods or typhoons — and oftenare 
too hard to measure froma ship. But we can 
easily reproduce them here in my lab, using 
sand and coloured-water flows ina tank. 

This is a dream workspace for me. But 
things were very different in 2014, when 
my master’s student and I were trying to 
recreate what is essentially a braided river 
channel on the sea floor. 1 borrowed space 
in an old fluid-mechanics lab, and we built a 
new water tank ina very small corner of this 
cramped, dark lab. 

It was challenging. My student redesigned 
the small flow boxes that direct the water 
and tested them again and again. One day, 
he called mein: “Do you think these look like 
submarine braided channels?” I said, “My 
god! You really did it.” But even so, working in 
that old space felt like the end of the world. 
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We spenta year and a half rebuilding the 
lab. Now, it is ideal for a flow and sediment 
experimentalist — wonderful, open and 
bright. We have enough tools to make an 
idea become a prototype, and tanks and 
flumes to test an idea and then to redesign if 
needed. It’s a very positive cycle for me and 
for my students. 

These days, we use very fine-grained sand 
ina tank to build the continental slope — 
where the continental shelf dips into the 
sea — then we inject denser salt water to flow 
across the slope. We use fluorescent dyes 
to visualize the water flow and see how it 
controls erosion or deposition. 

During an experiment, this means that the 
lab is like a darkroom lit up by fluorescent 
water. It’s a wonderful, vivid experience 
when you see the phenomenon that you 
generated unfold. In many ways, it’s like 
being ona movie set. 


Steven Yueh Jen Lai is an associate professor 
in the department of hydraulic and ocean 
engineering at National Cheng Kung University 
in Taiwan. Interview by Kendall Powell. 
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olitical instability and escalating trade tensions have dominated 

2019. As US-China relations deteriorate and the uncertain con- 

sequences of the United Kingdom’s planned departure from 

the European Union loom, science cannot escape the fray. In 

the United Kingdom, cross-border funding programmes with 
Europe are increasingly vulnerable, and inthe United States, a govern- 
ment-led crackdown to root out alleged espionage is complicating joint 
efforts between American and Chinese researchers. 

Thefullextentofany damagetoresearchoutputisnotyetclear, butearly 
signs in UK and EU collaborative science are worrying. Between 2015 and 
2018, the United Kingdom’s annual share of EU research funding fell by 
€430 million (US$473 million), largely owing to a reduction in grant 
applications. The number of UK-EU collaborative articles published 
injournals tracked by the Nature Index grew by just 1.3% from 2016 to 
2018, although the stagnation seems unlikely to have been caused by 
the Brexit vote given the long lead times for research publication. By 
contrast, US-China collaborative articles jumped by 32% during the 
same period. 

“There shouldn't be any barriers,” says US-based neuroscientist, 
Hongkui Zeng, who relies on imaging expertise in China to investigate 
brain functioning. The stories of international partnerships you'll read 
about inthis supplement are illustrative of resilience against political 
pressure. They are stories of mutual admiration, not animosity, and 
a desire for openness, not secrecy. And there is a lot at stake. In the 
journals covered by the Nature Index, the number of internationally 
collaborative articles across the four broad subject areas has risen 
by between 21% (physical sciences) and 48% (chemistry) since 2012. 
Publications with thousands of authors are becoming more commonin 
fields known for conducting big science projects, suchas high-energy 
physics, genetics and oncology, as highlighted in this supplement. 

This rise in collaborative research has been driven by necessity. As 
global challenges become increasingly complex, the most valuable 
teams are those with interdisciplinary skill sets and diverse perspec- 
tives. For research, the truism that great things are seldom donealone, 
has never been more accurate. 


Bec Crew 
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Shared endeavours, such as this energy study by the Havard China Project, continue to flourish. 


Science weathers 
political ill wind 


Despite government tensions, research 
collaboration between China and the United 
States remains strong. By Chris Woolston 


ome projects are too big for a single 
lab. Or, for that matter, a single coun- 
try. Hongkui Zeng, a neuroscientist at 
the Allen Institute for Brain Science 
in Seattle, Washington, is working on 
an ambitious project that spans the Pacific. 
Her team is attempting to untangle the sub- 
tle structural differences among groups 
of neurons in the mouse neocortex, where 
higher cognitive functioning such as sensory 
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perception and spatial reasoning is processed. 
The pursuit requires major assistance from sci- 
entists in China, whose international research 
presence is strong, despite growing mistrust 
fromthe US government. 

To get aclearer picture of the mouse neocor- 
tex, Zeng depends on high-resolution neuron 
images from Huazhong University of Science 
and Technology, in China (HZAU). “They havea 
unique bioimaging centre,’ she says. Access to 
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these images allows researchers to more closely 
examine the shape and structure of the axons 
that carry electrical messages from neurons. 
“The morphology helps us understand how 
neurons form networks and communicate with 
eachother,’ she says. “It’s critical information.” 

In science as in cortexes, networks are cru- 
cial. According to the Nature Index, connec- 
tions between the United States and China 
are stronger than ever. The number of papers 
co-authored by the two countries in the 82 
high-quality journals tracked by the index 
leapt from 3,413 in 2015 to 4,631 in 2018. Chi- 
nese authors collaborated with researchers 
in the United States more often than in any 
other region, and China is second only to the 
European Unionas the collaborator of choice 
for researchers in the United States. 

These partnerships have formed against 
a backdrop of political tensions, economic 
tariffs, and even fears of academic espio- 
nage. In September, US federal prosecutors 
announced the arrest of Zhongsan Liu, head 
of the New York office of the China Associa- 
tion for International Exchange of Personnel, 
for his alleged involvement ina conspiracy to 
fraudulently obtain US visas for Chinese gov- 
ernment employees. A month earlier, the Fed- 
eral Bureau of Investigation arrested Chinese 
chemist, Feng Tao, an associate professor at 
the University of Kansas, for allegedly failing to 
disclose his full-time employment with Fuzhou 
University in China. 

Earlier this year, the MD Anderson Cancer 
Center in Houston, Texas, sacked three Chi- 
nese scientists over alleged theft of research 
data for China, and Science magazine reported 
that the US National Institutes of Health (NIH) 
sent letters to 77 institutions warning of col- 
laboration with scientists who may have ties to 
foreign governments, including China. 

Such incidents have caused consternation 
and unease among Chinese researchers and 
their collaborators in the United States, Zeng 
says. But scientific cooperation remains strong 
for a fundamental reason: each country has 
intellectual and material scientific resources 
that transcend political boundaries. “Science 
needs a free exchange of information,” says 
Zeng. “There shouldn't be any barriers.” 

Cooperation flows in both directions. In 
2018, Zeng, and Allen Institute geneticist, 
Linda Madisen, were co-authors on a paper 
that presented a three-dimensional atlas of 
the cholinergic system in the mouse brain, 
an important model for Alzheimer’s disease 
research. The study, published in the Proceed- 
ings of the National Academy of Sciences, was 
led by researchers at HZAU, and was based on 
mouse cell lines created by Zeng and Madisen 
in the United States. “We develop tools and 
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resources and share them with the scientific 
community,” says Zeng. “They should be avail- 
able to anybody.” 

Despite its high-profile warnings regarding 
foreign research partnerships, collaboration 
also continues at the NIH. Christopher Buck, 
a virologist at the National Cancer Institute’s 
Center for Cancer Research, a division of the 
NIH, credits Chinese researchers for playing a 
major role in his latest finding: the discovery 
of 12 newtypes of small DNA tumour viruses, 
infectious agents with the potential to cause 
cancer. He found the first hints of a new virus 
in fish DNA sequenced by researchers at the 
BGI Academy of Marine Sciences in Shenzhen, 
China, led by Chao Bian. 


Available to anybody 


Bian’s team is part of BGI Marine, the agri- 
cultural arm of China’s BGI Group, a gene-se- 
quencing company that runs the largest 
genetics research centre in the world. The BGI 
researchers didn’t have viruses in their sights 
when they originally shared DNA sequences 
with the NIH researchers. They were investi- 
gating the genetic make-up of the green Asian 
arowana (Scleropages formosus), a prized 
aquarium fish that can fetch several hundred 
thousand dollars on the market. 

When Buck informed them of the intriguing 
viral sequences, they returned to their samples 
and sequenced the entire viral genome. “They 
kindly and bravely trusted that this strange 
person from across the ocean had a valid 
hypothesis,” he says. A preprint of the study 
was published in bioRxiv in August. 


BGI’s sequencing powers fuel a significant 
portion of China-US scientific collaboration. 
In the Nature Index, BGI is involved in 7% (see 
‘Big picture science’, S28) of collaborative 
articles involving at least one corporate insti- 
tution publishing in genetics between 2015 
and 2018. 

“| have great admiration for BGI,” Buck says. 
“They’re sequencing all sorts of interesting 
things, and they’re sharing what they find. It 
takes teams of specialists to look at the data 
and see what’s going on.” He adds that no one 
at the NIH has ever warned him about shar- 
ing sensitive materials with researchers from 


“They kindly and bravely 
trusted that this strange 
person from across the ocean 
had avalid hypothesis.” 


China or any other country. “Ihave no secrets,” 
says Buck. “Taxpayers are paying me to create 
scientific knowledge and broadcast it as widely 
as possible.” 

Harvard University’s long history of schol- 
arship on China and deep ties with the country 
has set the stage for ongoing cooperation in 
many scientific fields, says Michael McElroy, 
a climate scientist and chair of the Harvard 
China Project, ajoint effort of several Harvard 
colleges established in 1993 to understand and 
tackle energy, environmental and economic 
issues in China and beyond. 

For the past few decades, hundreds of 


Chinese graduate students who came to Har- 
vard as part of the project have returned to 
their home country to become government 
officials or scientists. Those students are part 
of a larger trend: nearly 10,000 researchers, 
mostly of Chinese origin, moved from the 
United States to China in 2017 alone. 


Great admiration 


Those networks have led to many collabo- 
rations, including a Nature Sustainability 
paper published inJuly suggesting that China 
may be five to ten years ahead of schedule in 
meeting its Paris Agreement pledge to curb 
carbon dioxide emissions by 2030. The paper 
was co-authored by McElroy and a group of 
Chinese researchers led by Haikun Wang, a 
researcher at Nanjing University and one of 
McElroy’s former graduate students. 

McElroy says that much of the funding for 
the Harvard China Project now comes from 
the Harvard Global Institute, a sign of change. 
“In the early days, we had funding from the 
National Science Foundation and NASA,” he 
says. “Today, there’s no possibility of getting 
funding from official US sources for China-re- 
lated work. It’s discouraged in Washington.” 

Despite the sometimes uneasy climate, 
McElroy predicts that collaboration between 
US and Chinese scientists will continue to pro- 
pel science in both countries. “The strength of 
cooperative scholarship has not declined,” he 
says. “If anything, it’s growing.” 


Chris Woolston is a freelance science writer in 
Billings, Montana. 


PROLIFIC PAIRING 


Growth in the number and strength of institution-to-institution research relationships shows no sign of slowing, despite political tensions between the US and China. 


Bilateral partnerships 
Research relationships between US and Chinese 
institutions grew strongly in the 6 years to 2018. 
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Big picture science 


Across continents and research fields, big science is a global enterprise, 
yet even for leading collaborators, the strongest partnerships are mainly local. 
Data analysis by Bo Wu; infographic by Alisdair MacDonald 


COLLABORATIVE 
CLUSTERS 


The infographic shows the top 
25 research partners of big 
science leading collaborators 
in 3 fields: high-energy physics, 
life sciences and genomics. 


The Nature Index ranks 
institutions in the big science 
fields by their fractional counts 
(FC), referring to the share of 
their affiliated authors’ 
contributions, and article 
counts (AC) in 82 high-quality 
journals. The table rankings 
(pages S39-S42) are for high 
affiliation articles only, meaning 
those with authors from 10 or 
more separate principal 
institutions. 


The partner relationships 
shown are for the US National 
nstitutes of Health (NIH), which 
ranks 2nd among the world’s 
top institutions for producing 
big science research articles in 
he field of oncology and 
immunology (see page $42) 
and 3rd in the field of genetics 
see page S39); the European 
Organization for Nuclear 
Research (CERN), in 
Switzerland, which is the 3rd 
biggest contributor to big 
science articles in physics and 
astronomy in the Nature Index 
(see page S40-S41); and BGI, a 
genome sequencing company 
that is China’s biggest 
contributor to big science in 
genetics (see pS39). This 
infographic is based on all 
collaborative articles from the 
three institutions identified, 
regardless of the number of 
affiliations. 
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Common collaborations 

The NIH (US, life sciences) and 
BGI (China, genomics) have 3 top 
25 collaborators in common: 
Harvard University, Johns Hopkins 
University and the Chinese 
Academy of Sciences. 


$28 | Nature | Vol575 | 21 November 2019 


© 2019 Springer Nature Limited. All rights reserved. 


LEGEND Institution Collaboration score (CS) by bubble size Top CS ranking by line weight 


The top 25 collaborators of the three central NIH CERN BGI CS:5 CS: 100 CS: 200 1 5 10 15 20 25 
institutions are shown according to their joint 
collaboration score (CS) with the central institution, a) 
derived by summing the FCs* from articles with 
authors from both institutions. CS determines the 
size of the partner institutions’ bubbles. The rank 
from 1 to 25 of their CS with the central institution is 
indicated by their line weight. 
* For a definition of FC, see Collaborative clusters. 
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Near and far 

Of the top 25 collaborators with 
Chinese genome sequencing 
company, BGI, 13 are in China, 4 are in 
the US, 3 are in Denmark and 2 are in 
the UK, while Australia, Canada, and 
Saudi Arabia each have 1. 
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aves the European Union. 


Brexit shadow hangs 
over EU partnerships 


Uncertainty about the United Kingdom’s role in EU science 
is damaging research networks. By Mark Peplow 


ore than three years after UK citizens 
narrowly voted for their country to 
leave the European Union, scientists 
still face great uncertainty about how 

Brexit will impact their research. 
The United Kingdom has already missed 
three deadlines for leaving the EU. Although the 
terms of the country’s withdrawal from the bloc 
have been agreed, political manoeuvring has 
stalled the ratification of this treaty. To break 
the deadlock, the United Kingdom is gearing 
up for a general election on 12 December, and 
faces anew Brexit deadline of 31January 2020. 
Even if the withdrawal treaty is ratified, it 
does little to resolve the country’s long-term 
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links with European science, which would be 
thrashed out in subsequent negotiations. If 
the United Kingdom leaves the EU next year 
without a deal in place to smooth the way, UK 
scientists could immediately lose access to the 
EU funding and collaborations that underpin 
their research. 

This corrosive confusion is already reshap- 
ing collaborative networks. Some research- 
ers are securing dual appointments that will 
enable them to straddle the United Kingdom 
and EU, and various UK universities have estab- 
lished partnership agreements with continen- 
tal institutions so that their staffcan continue 
to access EU funds. 
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“They want a Brexit-proof solution,” says 
Kurt Deketelaere, secretary-general of the 
League of European Research Universities in 
Leuven, Belgium. 

These changes could have far-reaching 
effects on existing partnerships. UK research- 
ers collaborate more with researchers in other 
EU countries than any other region. According 
to the Nature Index, collaborative articles by 
UK-EU researchers in five leading journals 
(Nature, Science, Proceedings of the National 
Academy of Sciences, Nature Communications 
and Science Advances) grew by 36% between 
2015 and 2018, although for reasons still not 
clear, the growth in collaborative articles 
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across all 82 journals tracked by the index 
has stalled since before 2016, when the vote 
to leave the EU was taken. 

The United Kingdom has also benefited 
significantly from EU research programmes. 
Huge schemes such as Horizon 2020, the 
EU’s €77 billion (US$85 billion) research and 
innovation programme, have provided a use- 
ful protocol for cross-border collaboration, 
facilitated by funding and helpful immigration 
arrangements. “That’s a magic combination 
for doing research,” says Graeme Reid, chair 
of science and research policy at University 
College London (UCL). 

That success story is souring. In October, 
the Royal Society pointed out that the United 
Kingdom’s annual share of EU research fund- 
ing fell from €1.49 billion in 2015 to €1.06 bil- 
lion in 2018, largely caused by a reduction in 
grant applications from UK researchers. It also 
found that the number of researchers coming 
tothe United Kingdom through the EU’s Marie 
Sktodowska Curie Fellowships has fallen by 
35%, from 515 in 2015 to 336 in 2018. 


Magic combination 


“We have seen a dramatic drop in the number 
of leading researchers who want to come to 
the UK,” says Venki Ramakrishnan, president 
of the Royal Society. “People do not want to 
gamble with their careers, when they have no 
sense of whether the UK will be willing and able 
to maintain its global scientific leadership.” 

Brexit comes at acrucial time for the future 
of EU science. The next research and innova- 
tion programme, Horizon Europe, will run 
from 2021 to 2027 and is expected to disburse 
about €100 billion. Almost half of this money 
is likely to goto large academia-—industry col- 
laborations in areas suchas health, climate and 
food. The final budget and other details should 
be agreed by the end of 2020. 

If Brexit goes ahead witha withdrawal agree- 
mentin place, most researchers hopeit will pave 
the way for the United Kingdom to participate 
in Horizon Europe as an associate member, an 
option the UK governmentsays it will consider. 
This might involve paying into the central fund, 
so that UK researchers can apply for grants in 
the same way as EU members, albeit with little 
influence over the programme's strategy. 

Leaving without a deal would probably 
stymie the chances of associate membership 
altogether, and leave a huge question mark over 
the future status of the 17% of scientists working 
inthe United Kingdom who are from other EU 
countries. It would also have a sudden impact 
on UK participation in ongoing EU collabora- 
tions, particularly affecting UK researchers 
responsible for the management and finance 
of EU projects. 


Peter Coveney, a computational scientist at 
UCL, coordinates two major projects that apply 
advanced computing to biological modelling, 
backed by €12 millionin Horizon 2020 funding. 
In September, he learnt that the European Com- 
mission would ask UK project coordinators to 
step down in the event of a no-deal Brexit. To 
counter that, Coveney is ready to move the 
management of his projects to the continent. In 
March, he accepted a professorship in applied 
high-performance computing at the Univer- 
sity of Amsterdam inthe Netherlands, which he 
holds in addition to his existing UCL roles. “It’s 
aBrexit mitigation strategy,” he says. 

Leading UK universities are adopting a simi- 
lar approach, signing cooperation agreements 
with partners in continental Europe to inten- 
sify collaboration, establish joint research 
programmes and exchange staff and students. 
The University of Cambridge has partnered 
with the Ludwig Maximilians University of 
Munich in Germany, for example, and the 
University of Oxford has established an office 
in Berlin to facilitate partnerships with insti- 
tutions there. 

Some regions of the United Kingdom are par- 
ticularly vulnerable to Brexit, and may strug- 
gle to adapt. Northern Ireland, for example, 
depends heavily on collaborations with part- 
ners from the Republic of Ireland, a separate 


“There’s noway that Brexit 
won't have some negative 
impact onresearch 
collaboration.” 


country and EU member state. “Sixty-three 
per cent of Horizon 2020 applications from 
researchers in Northern Ireland involved a part- 
ner from the republic,” says Gerry McKenna, 
who chairs the Royal Irish Academy’s North- 
South Committee, which is concerned with 
collaboration across the Irish border. “There’s 
no way that Brexit won't have some negative 
impact on research collaboration.” 

Amid these damage limitation efforts, 
Reid has been looking farther afield for ways 
to diversify the United Kingdom’s research 
collaborations. He and statistician, Adrian 
Smith, who leads the London-based Alan 
Turing Institute, have written a report for the 
UK government that outlines opportunities 
for international collaboration if the country 
decides not to associate with Horizon Europe. 
Published on 5 November, the report suggests 
measures suchas dedicated funding streams to 
enhance global collaboration, and new fellow- 
ships to attract talent to the United Kingdom. 
Reid hopes that the report will demonstrate 
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that there are alternatives to traditional EU- 
focused collaborations. 

The UK government has also tried to reas- 
sure the domestic research community witha 
string of policy announcements, suchas reaf- 
firming a commitment to increase national 
spending on research and development to at 
least 2.4% of GDP by 2027. 

Italso said that international students willbe 
allowed to remain in the country for up to two 
years after graduation, sothey can seek employ- 
ment in the United Kingdom, and promised 
fast-track visa routes for foreign researchers. 
“I think that’s an enormous step forward,” says 
Martin Smith, a policy manager at Wellcome. 


Fast track 


However, the UK government’s rhetoric has 
focused on attracting ‘the brightest and 
best’ from abroad. Smith says the new visa 
systems must cover all levels of the scientific 
workforce, including lab technicians and post- 
doctoral researchers, and adds that the costs 
of the system will be a key issue for early-career 
researchers. 

If there is a no-deal Brexit, the UK govern- 
ment has promised that it will honour funding 
for all successful competitive UK bids to Hori- 
zon 2020 up tothe end of next year, potentially 
costing hundreds of millions of pounds. But 
this overlooks the contribution of other EU 
funding streams. 

EU structural funds, for example, are used 
to boost the economic development of EU 
regions that may be lagging behind, by invest- 
ing in projects that can enhance innovation 
and create jobs. 

“Structural funds have been crucial in build- 
ing up the research base in Northern Ireland 
to an internationally competitive level,” says 
McKenna. Several research facilities in North- 
ern Ireland depended on such structural fund- 
ing, he says, including the Northern Ireland 
Science Park. 

Funding aside, the United Kingdom may 
simply become less attractive to international 
researchers ifit adopts a hostile stance towards 
the EU during future negotiations. “This is not 
justa financial equation,” says Reid. “An environ- 
ment that is welcoming and nurturingis critical.” 

Deketelaere says that top researchers are 
already choosing to take their research out 
of the country. “In continental Europe, we’re 
seeing an enormous influx of unsolicited 
applications for jobs in our universities, from 
excellent people,” he says. This reshaping of 
the research landscape could become Brexit’s 
lasting legacy for science. 


Mark Peplow is a science writer based in 
Cambridge, UK. 
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Conducting the 
multi-author choir 


Large research teams can produce higher impact work than 
scientists who go it alone, but organizing a paper produced 


by multitudes can be a major challenge. By Jack Leeming 


Andrew Shepherd and his team collect cores from the George VI Ice Shelf, Antarctica, in their work estimating ice loss from polar caps. 


or a frog, exposure to the amphibian 
chytrid fungus (Batrachochytrium 
dendrobatidis) is very bad news indeed. 
The fungus thrives in the same wet, 
hot conditions that frogs favour and 
it grows on amphibian skin. Frogs breathe 
through their skin, which is used by almost 
all species for electrolyte exchange. Chytrid 
prevents electrolytes from entering the 
animal’s body, which eventually causes a 
heart attack. 

Chytrid fungus species are responsible 
for significant amphibian population reduc- 
tions in Central and North America, Europe 
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and Australia. Although declines were at 
their worst in the 1980s, one 2004 study sug- 
gested that at least 43% of amphibian species 
are dwindling worldwide. New Guinea, home 
to 6% of the world’s frog species, is one place 
chytrid is yet to invade. 

Deborah Bower, an ecologist at the Univer- 
sity of New England in Armidale, Australia, is 
investigating proactive protection strategies 
for New Guinea, including increased quaran- 
tine measures and anisland-wide surveillance 
programme. In 2015, she collaborated with 
29 other scientists on these and other rec- 
ommendations. The results were published 
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in June 2019 in Frontiers in Ecology and the 
Environment. 

Such collaboration is unusual in Bower’s 
field, where single-author papers are com- 
mon. “When the fungus gets to New Guinea, 
more than 100 frog species could go extinct,” 
she says. “The island has a complex political 
system; it’s half Papua New Guinea and half 
Indonesia. There’s not much local experience 
in dealing with the disease. We brought in sci- 
entists from the US and Australia who had 
experience with chytrid, plus experts from 
a policy background who have worked with 
governments on large-scale changes.” 
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Bower hopes the paper will have more 
impact than something she could have pro- 
duced alone. “We had authors from five dif- 
ferent countries, which gave us access to more 
skills. One of the co-authors did climatic mod- 
elling on the fungus, for example.” 

Another bonus, adds Bower, is a more 
refined document. “There are more eyes going 
over it,” she says. 

At the same time, this presented a chal- 
lenge. Organizing schedules was difficult, 
almost as much as fielding multiple pages of 
feedback. “Getting comments from 29 peo- 
ple is overwhelming. I had maybe 20 differ- 
ent documents to go through. A colleague 
travelled 460 kilometres from Sydney and 
we Sat down with two computers, one with 
tracked changes and one with comments, to 
manage it.” 

Another struggle was editorial policies 
and style guides. In 2017, Science accepted 
a ‘perspective’ article from Bower and four 
co-authors to outline work in progress. “I had 
to ditch 25 authors,” she says, noting that the 
journal sets a limit of five authors for its per- 
spective articles. 


Greater impact 


Scientists are co-authoring more than ever 
before. In 2016, The Economist reviewed more 
than 34 million research papers published 
between 1996 and 2015, and found that the 
average author numbers grew from 3.2 to 
4.4 per paper. A 2018 Nature Index analysis 
found that the field of high-energy physics 
is largely responsible for the rise of papers 
authored by more than 1,000 individual 


scientists in recent years. A 2014 collabora- 
tion estimating the size of the Higgs boson, 
for example, listed a record-breaking 5,154 
authors. 

Papersin physics and astronomy, genetics, 
oncology and immunology, were also identi- 
fied by Nature Index to be the most likely to 
have long author lists. 

CERN’s particle-collision experiment, 
ATLAS, which is designed to test the stand- 
ard model of physics, makes use of one of 


“The main challenge issome 
very large characters in 

the project, who have very 
different opinions.” 


two general-purpose detectors at the Large 
Hadron Collider in Switzerland, and regularly 
produces physics megapapers. Karl Jakobs, 
a physicist and ATLAS collaboration spokes- 
person, is part of a publishing group that 
makes up the 3,000-strong team. “In terms 
of who actually writes the paper, we assign two 
or three editors, called an editorial team,” says 
Jakobs. “They discuss, present an outline and 
write the paper, with input from the scientists 
who provided the data.” 

Papers are then subject to a series of internal 
peer reviews and the editorial team takes in 
contributor feedback, followed by an external 
institutional review froma collaborating phys- 
ics department. The extent of this back-and- 
forth editing before a paper is even submitted 
to ajournal might seem intimidating for some, 


BIG PHYSICS PUBLICATIONS PATTERNS 


The heat map shows the quantity of research papers according to the number of authors published 
by 4 large collaborative physics experiments at CERN in Switzerland (ATLAS, LHCb, CMS and ALICE) 
as well as the Laser Interferometer Gravitational-Wave Observatory (LIGO) in the US. 
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but “it’s not anightmare”, says Jakobs, because 
the collaboration and review process is well 
defined and organized in a way that makes 
sense to the participants. 


Big personalities 


The ATLAS collaboration published more than 
100 papers last year, but similar efforts can be 
astruggle in other fields. Andrew Shepherd, an 
Earth observation scientist at the University 
of Leeds, UK, leads a publishing consortium 
of 96 researchers who estimate sea-ice loss 
from polar ice caps. 

“In 2010, when we started working on this, 
there were probably around 50 individual esti- 
mates in the literature for how much ice was 
being lost,” Shepherd says. “The project was 
established to shed light on why there were 
such large differences between individual esti- 
mates, and then to produce a single estimate 
for the community. 

“The main challenge is some very large 
characters in the project who have very dif- 
ferent opinions as to whether ice is being lost 
or gained, for instance,” he says. 

Shepherd explains that organizing the input 
of almost 100 experts onatopic of great inter- 
est to the press and public can be difficult. “It’s 
very intensive on my part. I calculated that, in 
the first assessment, I sent 5,000 e-mails on the 
project. We’ve done three assessments now. 
I haven’t counted those e-mails because it’s 
quite depressing, but it’s probably about the 
same,” he says. 

The ice sheet mass balance inter-compar- 
ison exercise (IMBIE), which has produced 
high-profile papers in Science (2012) and 
Nature (2018), “ate into my summer holi- 
days”, Shepherd admits. “I’ve spent proba- 
bly about two hours every morning of my 
vacation this year dealing with the minu- 
tiae of publishing this most recent paper. It 
demands alot of time.” 

Shepherd says this time investment has its 
rewards. The group’s first publication esti- 
mated sea-ice loss of around 4,000 gigatonnes 
from the Antarctic and Greenland ice sheets 
since 1992. The paper has clocked more than 
800 citations since 2012, which is “very, very 
high for the climate sciences”, Shepherd says. 
Italso has a high Altmetrics score, which meas- 
ures interest from social media and online 
news audiences. 

“We havea broader impact,” says Shepherd. 
“The US Environmental Protection Agency has 
been using our data for the past four or five 
years as aclimate indicator, for example. It’s 
avery rewarding project.” 


Jack Leeming is an editor for Nature based in 
London. 
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